[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19363 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144873850 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -479,7 +492,9 @@ private[sql] object RelationalGroupedDataset { /** * The Grouping Type */ - private[sql] trait GroupType + private[sql] trait GroupType { +override def toString: String = getClass.getSimpleName.replace("$", "") --- End diff -- `getClass.getSimpleName.stripSuffix("$").stripSuffix("Type")` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144873610 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1341,8 +1341,69 @@ class DatasetSuite extends QueryTest with SharedSQLContext { Seq(1).toDS().map(_ => ("", TestForTypeAlias.seqOfTupleTypeAlias)), ("", Seq((1, 1), (2, 2 } + + test("Check RelationalGroupedDataset toString: Single data") { +val kvDataset = (1 to 3).toDF("id").groupBy("id") +val expected = "RelationalGroupedDataset: [" + + "group by: [id: int], value: [id: int], type: GroupByType]" --- End diff -- a better idea: * `group by` -> `grouping expressions` * `type: GroupByType` -> `type: GroupBy`, i.e. remove the `Type` suffix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144753514 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -46,6 +36,20 @@ abstract class QueryTest extends PlanTest { Locale.setDefault(Locale.US) /** + * Makes sure the answer matches the expected result + */ + def checkString(expected: String, actual: String): Unit = { +if (expected != actual) { + fail( +"The actual string gives wrong result:\n\n" + sideBySide( --- End diff -- it's kind of an overkill, the `toString` we are testing always return a single line, just `assert` is enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144753446 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/RelationalGroupedDatasetSuite.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +class RelationalGroupedDatasetSuite extends QueryTest with SharedSQLContext { --- End diff -- we can put the tests in `DataSetSuite`, as `DataSet.toString` is also tested there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144753415 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/KeyValueGroupedDatasetSuite.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +class KeyValueGroupedDatasetSuite extends QueryTest with SharedSQLContext { --- End diff -- we can put the tests in `DataSetSuite`, as `DataSet.toString` is also tested there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144753249 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -564,4 +564,25 @@ class KeyValueGroupedDataset[K, V] private[sql]( encoder: Encoder[R]): Dataset[R] = { cogroup(other)((key, left, right) => f.call(key, left.asJava, right.asJava).asScala)(encoder) } + + override def toString: String = { +val builder = new StringBuilder +val kFields = kExprEnc.schema.map { + case f => s"${f.name}: ${f.dataType.simpleString(2)}" +} +val vFields = vExprEnc.schema.map { + case f => s"${f.name}: ${f.dataType.simpleString(2)}" +} +builder.append("KeyValueGroupedDataset: [key: [") +builder.append(kFields.take(2).mkString(", ")) +if (kFields.length > 2) { + builder.append(" ... " + (kFields.length - 2) + " more field(s)") +} +builder.append("], value: [") +builder.append(vFields.take(2).mkString(", ")) +if (vFields.length > 2) { + builder.append(" ... " + (vFields.length - 2) + " more field(s)") +} +builder.append("]]").toString() --- End diff -- shall we include `df` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144753173 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -465,6 +466,19 @@ class RelationalGroupedDataset protected[sql]( Dataset.ofRows(df.sparkSession, plan) } + + override def toString: String = { +val builder = new StringBuilder +builder.append("RelationalGroupedDataset: [groupingBy: [") +val kFields = groupingExprs.map(_.asInstanceOf[NamedExpression]).map { + case f => s"${f.name}: ${f.dataType.simpleString(2)}" +} +builder.append(kFields.take(2).mkString(", ")) +if (kFields.length > 2) { + builder.append(" ... " + (kFields.length - 2) + " more field(s)") +} +builder.append(s"], df: ${df.toString}, type: $groupType]").toString() --- End diff -- why include `df` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144753122 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -465,6 +466,19 @@ class RelationalGroupedDataset protected[sql]( Dataset.ofRows(df.sparkSession, plan) } + + override def toString: String = { +val builder = new StringBuilder +builder.append("RelationalGroupedDataset: [groupingBy: [") --- End diff -- `groupingBy` -> `group by` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144753076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -18,13 +18,13 @@ package org.apache.spark.sql import scala.collection.JavaConverters._ +import scala.util.control.NonFatal --- End diff -- unnecessary import --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144753090 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -21,6 +21,7 @@ import java.util.Locale import scala.collection.JavaConverters._ import scala.language.implicitConversions +import scala.util.control.NonFatal --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144501447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql]( Dataset.ofRows(df.sparkSession, plan) } + + override def toString: String = { +try { + val builder = new StringBuilder + builder.append("RelationalGroupedDataset: [key: [") --- End diff -- `key:` or `groupingBy:`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144502680 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/RelationalGroupedDatasetSuite.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +class RelationalGroupedDatasetSuite extends QueryTest with SharedSQLContext { + import testImplicits._ + + test("Check RelationalGroupedDataset toString: Single data") { +val kvDataset = (1 to 3).toDF("id").as[SingleData].groupBy("id") +val expected = "RelationalGroupedDataset: [key: [id: int], value: [id: int], GroupByType]" +val actual = kvDataset.toString +checkString(expected, actual) + } + + test("Check RelationalGroupedDataset toString: over length schema ") { +val kvDataset = (1 to 3).map( x => (x, x.toString, x.toLong)) + .toDF("id", "val1", "val2").as[TripleData].groupBy("id") --- End diff -- Is `TripleData` necessary? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144502598 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/RelationalGroupedDatasetSuite.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +class RelationalGroupedDatasetSuite extends QueryTest with SharedSQLContext { + import testImplicits._ + + test("Check RelationalGroupedDataset toString: Single data") { +val kvDataset = (1 to 3).toDF("id").as[SingleData].groupBy("id") --- End diff -- Is `SingleData` necessary? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144500697 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql]( Dataset.ofRows(df.sparkSession, plan) } + + override def toString: String = { +try { + val builder = new StringBuilder + builder.append("RelationalGroupedDataset: [key: [") + val kFields = groupingExprs.map(_.asInstanceOf[NamedExpression]).map { --- End diff -- Are all expressions in `groupingExprs` `NamedExpression`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144501966 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala --- @@ -46,6 +36,20 @@ abstract class QueryTest extends PlanTest { Locale.setDefault(Locale.US) /** + * Makes sure the answer matches the expected result + */ + def checkString(expected: String, actual: String): Unit = { +if (expected != actual) { + fail( +"KeyValueGroupedDataset.toString() gives wrong result:\n\n" + sideBySide( --- End diff -- You didn't only use this against `KeyValueGroupedDataset`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144501685 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -465,6 +466,24 @@ class RelationalGroupedDataset protected[sql]( Dataset.ofRows(df.sparkSession, plan) } + + override def toString: String = { +try { + val builder = new StringBuilder + builder.append("RelationalGroupedDataset: [key: [") + val kFields = groupingExprs.map(_.asInstanceOf[NamedExpression]).map { +case f => s"${f.name}: ${f.dataType.simpleString(2)}" + } + builder.append(kFields.take(2).mkString(", ")) + if (kFields.length > 2) { +builder.append(" ... " + (kFields.length - 2) + " more field(s)") + } + builder.append(s"], value: ${df.toString}, $groupType]").toString() --- End diff -- `value:` or `df:`? `$groupType` -> `type: $groupType`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19363: [SPARK-22224][SQL]Override toString of KeyValue/R...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r144500085 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -564,4 +565,30 @@ class KeyValueGroupedDataset[K, V] private[sql]( encoder: Encoder[R]): Dataset[R] = { cogroup(other)((key, left, right) => f.call(key, left.asJava, right.asJava).asScala)(encoder) } + + override def toString: String = { +try { + val builder = new StringBuilder + val kFields = kExprEnc.schema.map { +case f => s"${f.name}: ${f.dataType.simpleString(2)}" + } + val vFields = vExprEnc.schema.map { +case f => s"${f.name}: ${f.dataType.simpleString(2)}" + } + builder.append("[key: [") + builder.append(kFields.take(2).mkString(", ")) + if (kFields.length > 2) { +builder.append(" ... " + (kFields.length - 2) + " more field(s)") + } + builder.append("], value: [") + builder.append(vFields.take(2).mkString(", ")) + if (vFields.length > 2) { +builder.append(" ... " + (vFields.length - 2) + " more field(s)") + } + builder.append("]]").toString() +} catch { + case NonFatal(e) => --- End diff -- Although some methods of `StringBuilder` can throw exceptions, I didn't see simple `append` can throw exception. Access of `schema` should not throw exception at all. So I don't get why it is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org