[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user wmellouli closed the pull request at: https://github.com/apache/spark/pull/22332 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user wmellouli commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215185048 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2226,16 +2226,18 @@ class Dataset[T] private[sql]( * `column`'s expression must only refer to attributes supplied by this Dataset. It is an * error to add a column that refers to some other Dataset. * -* You can choose to add new columns either at the end (default behavior) or at the beginning. +* The position of the new column start from 0, and a negative position means at the end (default behavior). */ - def withColumn(colName: String, col: Column, atTheEnd: Boolean): DataFrame = -withColumns(Seq(colName), Seq(col), atTheEnd) + def withColumn(colName: String, col: Column, atPosition: Int): DataFrame = --- End diff -- Added --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user wmellouli commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215184928 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -831,13 +831,21 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { }.toSeq) assert(df.schema.map(_.name) === Seq("key", "value", "newCol")) -val df2 = testData.toDF().withColumn("newCol", col("key") + 1, false) +val df2 = testData.toDF().withColumn("newCol", col("key") + 1, 0) --- End diff -- Test with negative position was covered for the public method `withColumn` and the private method `withColumns`: - https://github.com/apache/spark/pull/22332/files#diff-5d2ebf4e9ca5a990136b276859769289R852 - https://github.com/apache/spark/pull/22332/files#diff-5d2ebf4e9ca5a990136b276859769289R907 I'm testing 3 cases (in the same time) that add new column at the end: - negative position - last position - position greater than columns size --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user wmellouli commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215179856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2226,16 +2226,18 @@ class Dataset[T] private[sql]( * `column`'s expression must only refer to attributes supplied by this Dataset. It is an * error to add a column that refers to some other Dataset. * -* You can choose to add new columns either at the end (default behavior) or at the beginning. +* The position of the new column start from 0, and a negative position means at the end (default behavior). --- End diff -- I modified as you suggested --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215144932 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2226,16 +2226,18 @@ class Dataset[T] private[sql]( * `column`'s expression must only refer to attributes supplied by this Dataset. It is an * error to add a column that refers to some other Dataset. * -* You can choose to add new columns either at the end (default behavior) or at the beginning. +* The position of the new column start from 0, and a negative position means at the end (default behavior). --- End diff -- "starts at `0`. Any negative position means to add the column at the end"? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215145065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2226,16 +2226,18 @@ class Dataset[T] private[sql]( * `column`'s expression must only refer to attributes supplied by this Dataset. It is an * error to add a column that refers to some other Dataset. * -* You can choose to add new columns either at the end (default behavior) or at the beginning. +* The position of the new column start from 0, and a negative position means at the end (default behavior). */ - def withColumn(colName: String, col: Column, atTheEnd: Boolean): DataFrame = -withColumns(Seq(colName), Seq(col), atTheEnd) + def withColumn(colName: String, col: Column, atPosition: Int): DataFrame = +withColumns(Seq(colName), Seq(col), atPosition) /** * Returns a new Dataset by adding columns or replacing the existing columns that has --- End diff -- s/has/have --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215144732 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2226,16 +2226,18 @@ class Dataset[T] private[sql]( * `column`'s expression must only refer to attributes supplied by this Dataset. It is an * error to add a column that refers to some other Dataset. * -* You can choose to add new columns either at the end (default behavior) or at the beginning. +* The position of the new column start from 0, and a negative position means at the end (default behavior). */ - def withColumn(colName: String, col: Column, atTheEnd: Boolean): DataFrame = -withColumns(Seq(colName), Seq(col), atTheEnd) + def withColumn(colName: String, col: Column, atPosition: Int): DataFrame = --- End diff -- `@since 2.4.0`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215145351 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -831,13 +831,21 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { }.toSeq) assert(df.schema.map(_.name) === Seq("key", "value", "newCol")) -val df2 = testData.toDF().withColumn("newCol", col("key") + 1, false) +val df2 = testData.toDF().withColumn("newCol", col("key") + 1, 0) --- End diff -- What about tests with negative positions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22332: [SPARK-25333][SQL] Ability add new columns in Dat...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/22332#discussion_r215144982 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2226,16 +2226,18 @@ class Dataset[T] private[sql]( * `column`'s expression must only refer to attributes supplied by this Dataset. It is an * error to add a column that refers to some other Dataset. * -* You can choose to add new columns either at the end (default behavior) or at the beginning. +* The position of the new column start from 0, and a negative position means at the end (default behavior). */ - def withColumn(colName: String, col: Column, atTheEnd: Boolean): DataFrame = -withColumns(Seq(colName), Seq(col), atTheEnd) + def withColumn(colName: String, col: Column, atPosition: Int): DataFrame = +withColumns(Seq(colName), Seq(col), atPosition) /** * Returns a new Dataset by adding columns or replacing the existing columns that has * the same names. + * + * The position of new columns start from 0, and a negative position means at the end (default behavior). --- End diff -- Same as above --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org