[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21240 **[Test build #90266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90266/testReport)** for PR 21240 at commit [`1761068`](https://github.com/apache/spark/commit/17610689595f02a30730c0fc1a070c3652eabf7e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21240 This generator function implementation itself LGTM. I have other thoughts regarding the rewrite rule but it's better to discuss on JIRA. cc @cloud-fan @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186279795 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -222,6 +222,51 @@ case class Stack(children: Seq[Expression]) extends Generator { } } +/** + * Replicate the row N times. N is specified as the first argument to the function. + * {{{ + * SELECT replicate_rows(2, "val1", "val2") -> + * 2 val1 val2 + * 2 val1 val2 + * }}} + */ +@ExpressionDescription( +usage = "_FUNC_(n, expr1, ..., exprk) - Replicates `n`, `expr1`, ..., `exprk` into `n` rows.", +examples = """ +Examples: + > SELECT _FUNC_(2, "val1", "val2"); + 2 val1 val2 + 2 val1 val2 + """) +case class ReplicateRows(children: Seq[Expression]) extends Generator with CodegenFallback { --- End diff -- @viirya If you don't mind, i would like to do it in a follow-up. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186279765 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -702,6 +703,20 @@ object TypeCoercion { } } + /** + * Coerces first argument in ReplicateRows expression and introduces a cast to Long + * if necessary. + */ + object ReplicateRowsCoercion extends TypeCoercionRule { +private val acceptedTypes = Seq(IntegerType, ShortType, ByteType) +override def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { + case s @ ReplicateRows(children) if s.children.nonEmpty && s.childrenResolved && +s.children.head.dataType != LongType && acceptedTypes.contains(s.children.head.dataType) => --- End diff -- @viirya Thanks. I will fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186279481 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -222,6 +222,51 @@ case class Stack(children: Seq[Expression]) extends Generator { } } +/** + * Replicate the row N times. N is specified as the first argument to the function. + * {{{ + * SELECT replicate_rows(2, "val1", "val2") -> + * 2 val1 val2 + * 2 val1 val2 + * }}} + */ +@ExpressionDescription( +usage = "_FUNC_(n, expr1, ..., exprk) - Replicates `n`, `expr1`, ..., `exprk` into `n` rows.", +examples = """ +Examples: + > SELECT _FUNC_(2, "val1", "val2"); + 2 val1 val2 + 2 val1 val2 + """) +case class ReplicateRows(children: Seq[Expression]) extends Generator with CodegenFallback { --- End diff -- This can be easily implemented in codegen so we don't need `CodegenFallback`. We can deal with it in follow-up if you want. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186279427 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -702,6 +703,20 @@ object TypeCoercion { } } + /** + * Coerces first argument in ReplicateRows expression and introduces a cast to Long + * if necessary. + */ + object ReplicateRowsCoercion extends TypeCoercionRule { +private val acceptedTypes = Seq(IntegerType, ShortType, ByteType) +override def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { + case s @ ReplicateRows(children) if s.children.nonEmpty && s.childrenResolved && +s.children.head.dataType != LongType && acceptedTypes.contains(s.children.head.dataType) => --- End diff -- nit: `s.children.head.dataType != LongType` is redundant because we have `acceptedTypes.contains(...)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21010 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21010 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90264/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21010 **[Test build #90264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90264/testReport)** for PR 21010 at commit [`0bc77e8`](https://github.com/apache/spark/commit/0bc77e87689edbe252e23d322b1a64d317b19c67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90262/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21240 **[Test build #90262 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90262/testReport)** for PR 21240 at commit [`748003a`](https://github.com/apache/spark/commit/748003ab5b8d9741a6dec79860cc6bef1083c14b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21244: [SPARK-24185][SparkR][SQL]add flatten function to...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21244#discussion_r186278752 --- Diff: R/pkg/R/generics.R --- @@ -918,6 +918,10 @@ setGeneric("explode_outer", function(x) { standardGeneric("explode_outer") }) #' @name NULL setGeneric("expr", function(x) { standardGeneric("expr") }) +#' @rdname column_collection_functions +#' @name NULL +setGeneric("flatten", function(x, value) { standardGeneric("flatten") }) --- End diff -- great catch! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18447#discussion_r186278707 --- Diff: R/pkg/R/functions.R --- @@ -679,6 +679,19 @@ setMethod("hash", column(jc) }) +#' @details +#' \code{data_type}: Returns the data type of a given column. +#' +#' @rdname column_misc_functions +#' @aliases data_type data_type,Column-method +#' @note data_type since 2.3.0 --- End diff -- 2.4.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18447#discussion_r186278702 --- Diff: R/pkg/R/functions.R --- @@ -679,6 +679,19 @@ setMethod("hash", column(jc) }) +#' @details +#' \code{data_type}: Returns the data type of a given column. +#' +#' @rdname column_misc_functions +#' @aliases data_type data_type,Column-method +#' @examples \dontrun{data_type(df$c)} --- End diff -- see this line of the code example of hash https://github.com/mmolimar/spark/blob/ed52e2f856f78fb2dca23b6be2f682caa0a88c81/R/pkg/R/functions.R#L176 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18447#discussion_r186278685 --- Diff: R/pkg/R/functions.R --- @@ -679,6 +679,19 @@ setMethod("hash", column(jc) }) +#' @details +#' \code{data_type}: Returns the data type of a given column. +#' +#' @rdname column_misc_functions +#' @aliases data_type data_type,Column-method +#' @examples \dontrun{data_type(df$c)} +setMethod("data_type", --- End diff -- 2.4.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90263/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21186 **[Test build #90263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90263/testReport)** for PR 21186 at commit [`b746702`](https://github.com/apache/spark/commit/b746702a95cf5ab6dbbc4f5a60f8e7226805f82a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21070 Yup, https://github.com/apache/spark/pull/21070#issuecomment-386793202, just for clarification, the given attribute and literal is castable but they are not being as so, right? I believe this is a known issue and there were several tries: One approach was directly casting always and it was reverted (roughly 2 years ago?). Another approach was constant folding at optimizer level but it was rejected as it's too messy. Another approach was directly casting and comparing both values but it was also rejected since it sounded unsafe. It was a long old story so probably worth double checking the history but I feel sure that I remember this story correctly. The key point IIUC was that `translateFilter` should be super conservative and sounds we better need to check every possibility. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21213: [SPARK-24120] Show `Jobs` page when `jobId` is missing
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21213 cc @ajbozarth and @guoxiaolongzte who I believe are interested in this change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2958/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21240 **[Test build #90265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90265/testReport)** for PR 21240 at commit [`02ed058`](https://github.com/apache/spark/commit/02ed0582348f12473fdde8779c1d9e59ecfd84b1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21247: [SPARK-24190] Separating JSONOptions for read
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21247#discussion_r186277493 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -137,3 +121,40 @@ private[sql] class JSONOptions( factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars) } } + +private[sql] class JSONOptionsInRead( +@transient private val parameters: CaseInsensitiveMap[String], +defaultTimeZoneId: String, +defaultColumnNameOfCorruptRecord: String) + extends JSONOptions(parameters, defaultTimeZoneId, defaultColumnNameOfCorruptRecord) { + + def this( +parameters: Map[String, String], +defaultTimeZoneId: String, +defaultColumnNameOfCorruptRecord: String = "") = { +this( + CaseInsensitiveMap(parameters), + defaultTimeZoneId, + defaultColumnNameOfCorruptRecord) + } + + protected override def checkedEncoding(enc: String): String = { +// The following encodings are not supported in per-line mode (multiline is false) +// because they cause some problems in reading files with BOM which is supposed to +// present in the files with such encodings. After splitting input files by lines, +// only the first lines will have the BOM which leads to impossibility for reading +// the rest lines. Besides of that, the lineSep option must have the BOM in such +// encodings which can never present between lines. +val blacklist = Seq(Charset.forName("UTF-16"), Charset.forName("UTF-32")) +val isBlacklisted = blacklist.contains(Charset.forName(enc)) +require(multiLine || !isBlacklisted, --- End diff -- BTW, don't we still need the blacklisting in write site too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21244 @viirya @mn-mikke @felixcheung @HyukjinKwon Thanks all for your help! @HyukjinKwon I will fix the two small things in my next PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r186277140 --- Diff: R/pkg/R/functions.R --- @@ -1906,6 +1906,7 @@ setMethod("atan2", signature(y = "Column"), #' @details #' \code{datediff}: Returns the number of days from \code{y} to \code{x}. +#' If \code{y} is later than \code{x} then the result is positive. --- End diff -- I'd add the same sentence to Python and Scala side too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r186277122 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1194,13 +1194,21 @@ case class AddMonths(startDate: Expression, numMonths: Expression) } /** - * Returns number of months between dates date1 and date2. + * Returns number of months between dates `date1` and `date2`. + * If `date1` is later than `date2`, then the result is positive. + * If `date1` and `date2` are on the same day of month, or both + * are the last day of month, time of day will be ignored. Otherwise, the + * difference is calculated based on 31 days per month, and rounded to + * 8 digits unless roundOff=false. */ // scalastyle:off line.size.limit @ExpressionDescription( usage = """ -_FUNC_(timestamp1, timestamp2[, roundOff]) - Returns number of months between `timestamp1` and `timestamp2`. - The result is rounded to 8 decimal places by default. Set roundOff=false otherwise., +_FUNC_(date1, date2) - If `date1` is later than `date2`, then the result --- End diff -- wait .. why did you change ` _FUNC_(timestamp1, timestamp2[, roundOff])`?? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21211: [SPARK-24131][PYSPARK][Followup] Add majorMinorVersion A...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21211 @jkbradley, have you had a change to take a look? If there's no more comments, will just merge it in few days. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186277072 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -222,6 +222,51 @@ case class Stack(children: Seq[Expression]) extends Generator { } } +/** + * Replicate the row N times. N is specified as the first argument to the function. + * {{{ + * SELECT replicate_rows(2, "val1", "val2") -> + * 2 val1 val2 + * 2 val1 val2 + * }}} + */ +@ExpressionDescription( +usage = "_FUNC_(n, expr1, ..., exprk) - Replicates `n`, `expr1`, ..., `exprk` into `n` rows.", --- End diff -- @viirya I did think about it Simon. But then, i decided to match the output with Hive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186277065 --- Diff: sql/core/src/test/resources/sql-tests/inputs/udtf_replicate_rows.sql --- @@ -0,0 +1,38 @@ +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 'row1', 1.1), +(2, 'row2', 2.2), +(0, 'row3', 3.3), +(-1,'row4', 4.4), +(null,'row5', 5.5), +(3, 'row6', null) +AS tab1(c1, c2, c3); + +-- Requires 2 arguments at minimum. +SELECT replicate_rows(c1) FROM tab1; --- End diff -- Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186277062 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -702,6 +703,20 @@ object TypeCoercion { } } + /** + * Coerces first argument in ReplicateRows expression and introduces a cast to Long + * if necessary. + */ + object ReplicateRowsCoercion extends TypeCoercionRule { +private val acceptedTypes = Seq(IntegerType, ShortType, ByteType) +override def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { + case s @ ReplicateRows(children) if s.childrenResolved && +s.children.head.dataType != LongType && acceptedTypes.contains(s.children.head.dataType) => --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21247: [SPARK-24190] Separating JSONOptions for read
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21247#discussion_r186277050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -137,3 +121,40 @@ private[sql] class JSONOptions( factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars) } } + +private[sql] class JSONOptionsInRead( +@transient private val parameters: CaseInsensitiveMap[String], +defaultTimeZoneId: String, +defaultColumnNameOfCorruptRecord: String) + extends JSONOptions(parameters, defaultTimeZoneId, defaultColumnNameOfCorruptRecord) { + + def this( +parameters: Map[String, String], +defaultTimeZoneId: String, +defaultColumnNameOfCorruptRecord: String = "") = { +this( + CaseInsensitiveMap(parameters), + defaultTimeZoneId, + defaultColumnNameOfCorruptRecord) + } + + protected override def checkedEncoding(enc: String): String = { +// The following encodings are not supported in per-line mode (multiline is false) +// because they cause some problems in reading files with BOM which is supposed to +// present in the files with such encodings. After splitting input files by lines, +// only the first lines will have the BOM which leads to impossibility for reading +// the rest lines. Besides of that, the lineSep option must have the BOM in such +// encodings which can never present between lines. +val blacklist = Seq(Charset.forName("UTF-16"), Charset.forName("UTF-32")) +val isBlacklisted = blacklist.contains(Charset.forName(enc)) +require(multiLine || !isBlacklisted, + s"""The ${enc} encoding must not be included in the blacklist when multiLine is disabled: + | ${blacklist.mkString(", ")}""".stripMargin) + +val isLineSepRequired = !(multiLine == false && + Charset.forName(enc) != StandardCharsets.UTF_8 && lineSeparator.isEmpty) +require(isLineSepRequired, s"The lineSep option must be specified for the $enc encoding") --- End diff -- @MaxGekk, how about we just try to remove this restriction? I thought that's your final goal in 2.4.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20894 I understood the rationale, problem and the approach and I don't feel strongly - I am less sure since 1. this option is specific to CSV about columns. It sounds adding complexity. 2. however, I doubt if the severity of the issue is worth enough adding an option and that amount of change. I believe I usually have been staying against in most of such cases in particular in JIRA level (probably worth enough checking Won't fix JIRAs). For clarification, please proceed if you guys feel it's needed and don't block on me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90260/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21193 **[Test build #90260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90260/testReport)** for PR 21193 at commit [`2b30654`](https://github.com/apache/spark/commit/2b30654bc50a39a8af597df68ba288280299defe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21228#discussion_r186276769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala --- @@ -96,7 +96,8 @@ object Rand { /** Generate a random column with i.i.d. values drawn from the standard normal distribution. */ // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.", + usage = """_FUNC_([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution. +Note that the function is non-deterministic in general case.""", --- End diff -- I mean to use a note like: https://github.com/apache/spark/blob/2ce37b50fc01558f49ad22f89c8659f50544ffec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L101-L103 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186276709 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -222,6 +222,51 @@ case class Stack(children: Seq[Expression]) extends Generator { } } +/** + * Replicate the row N times. N is specified as the first argument to the function. + * {{{ + * SELECT replicate_rows(2, "val1", "val2") -> + * 2 val1 val2 + * 2 val1 val2 + * }}} + */ +@ExpressionDescription( +usage = "_FUNC_(n, expr1, ..., exprk) - Replicates `n`, `expr1`, ..., `exprk` into `n` rows.", --- End diff -- I checked the design doc for INTERSECT ALL and EXCEPT ALL. Looks like the `n` is always stripped after Generate operation. So why we need to keep `n` in `ReplicateRows` outputs? Can we do it like: ``` > SELECT _FUNC_(2, "val1", "val2"); val1 val2 val1 val2 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21244: [SPARK-24185][SparkR][SQL]add flatten function to...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21244 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21244: [SPARK-24185][SparkR][SQL]add flatten function to...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21244#discussion_r186276706 --- Diff: R/pkg/R/functions.R --- @@ -3035,6 +3036,19 @@ setMethod("array_position", column(jc) }) +#' @details +#' \code{flatten}: Transforms an array of arrays into a single array. --- End diff -- not a big deal but let's match the doc too with Python and Scala (not to SQL one) since that's usual so far. Please fix it in the PRs for other functions later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21244 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21244: [SPARK-24185][SparkR][SQL]add flatten function to...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21244#discussion_r186276677 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1502,6 +1502,12 @@ test_that("column functions", { result <- collect(select(df, sort_array(df[[1]])))[[1]] expect_equal(result, list(list(1L, 2L, 3L), list(4L, 5L, 6L))) + # Test flattern --- End diff -- not a big deal at all but I'd say `flattern()` for consistency. I think detail and consistency are the key .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21010 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2957/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21010 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21010: [SPARK-23900][SQL] format_number support user specifed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21010 **[Test build #90264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90264/testReport)** for PR 21010 at commit [`0bc77e8`](https://github.com/apache/spark/commit/0bc77e87689edbe252e23d322b1a64d317b19c67). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2956/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21186 **[Test build #90263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90263/testReport)** for PR 21186 at commit [`b746702`](https://github.com/apache/spark/commit/b746702a95cf5ab6dbbc4f5a60f8e7226805f82a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SPARK-24112] Enable `convertMetastoreOrc` ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21186 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186276453 --- Diff: sql/core/src/test/resources/sql-tests/inputs/udtf_replicate_rows.sql --- @@ -0,0 +1,38 @@ +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 'row1', 1.1), +(2, 'row2', 2.2), +(0, 'row3', 3.3), +(-1,'row4', 4.4), +(null,'row5', 5.5), +(3, 'row6', null) +AS tab1(c1, c2, c3); + +-- Requires 2 arguments at minimum. +SELECT replicate_rows(c1) FROM tab1; --- End diff -- Add one case `SELECT replicate_rows()`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21217: [SPARK-24151][SQL] Fix CURRENT_DATE, CURRENT_TIME...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21217#discussion_r186276456 --- Diff: docs/sql-programming-guide.md --- @@ -1812,6 +1812,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set `true` to `spark.sql.allowCreatingManagedTableUsingNonemptyLocation` restores the previous behavior. This option will be removed in Spark 3.0. - Since Spark 2.4, the type coercion rules can automatically promote the argument types of the variadic SQL functions (e.g., IN/COALESCE) to the widest common type, no matter how the input arguments order. In prior Spark versions, the promotion could fail in some specific orders (e.g., TimestampType, IntegerType and StringType) and throw an exception. - In version 2.3 and earlier, `to_utc_timestamp` and `from_utc_timestamp` respect the timezone in the input timestamp string, which breaks the assumption that the input timestamp is in a specific timezone. Therefore, these 2 functions can return unexpected results. In version 2.4 and later, this problem has been fixed. `to_utc_timestamp` and `from_utc_timestamp` will return null if the input timestamp string contains timezone. As an example, `from_utc_timestamp('2000-10-10 00:00:00', 'GMT+1')` will return `2000-10-10 01:00:00` in both Spark 2.3 and 2.4. However, `from_utc_timestamp('2000-10-10 00:00:00+00:00', 'GMT+1')`, assuming a local timezone of GMT+8, will return `2000-10-10 09:00:00` in Spark 2.3 but `null` in 2.4. For people who don't care about this problem and want to retain the previous behaivor to keep their query unchanged, you can set `spark.sql.function.rejectTimezoneInString` to false. This option will be removed in Spark 3.0 and should only be used as a tempora ry workaround. + - In version 2.3, if `spark.sql.caseSensitive` is set to true, then the `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions incorrectly became case-sensitive and would resolve to columns (unless typed in lower case). In Spark 2.4 this has been fixed and the functions are no longer case-sensitive. --- End diff -- +1 for @felixcheung 's suggestion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/21070#discussion_r186276440 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -345,7 +345,7 @@ object SQLConf { "snappy, gzip, lzo.") .stringConf .transform(_.toLowerCase(Locale.ROOT)) -.checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo")) +.checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo", "lz4", "brotli", "zstd")) --- End diff -- Could you update [sql-programming-guide.md](https://github.com/apache/spark/blame/master/docs/sql-programming-guide.md#L967) together? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21240#discussion_r186276431 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -702,6 +703,20 @@ object TypeCoercion { } } + /** + * Coerces first argument in ReplicateRows expression and introduces a cast to Long + * if necessary. + */ + object ReplicateRowsCoercion extends TypeCoercionRule { +private val acceptedTypes = Seq(IntegerType, ShortType, ByteType) +override def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { + case s @ ReplicateRows(children) if s.childrenResolved && +s.children.head.dataType != LongType && acceptedTypes.contains(s.children.head.dataType) => --- End diff -- We should check if `s.children` isn't empty. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21240 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2955/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14083 @hvanhovell . Could you update the PR description benchmark result, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21240 **[Test build #90262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90262/testReport)** for PR 21240 at commit [`748003a`](https://github.com/apache/spark/commit/748003ab5b8d9741a6dec79860cc6bef1083c14b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21240: [SPARK-21274][SQL] Add a new generator function replicat...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21240 @maropu @viirya Thanks for the comments. I have made the changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21244 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21244 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90261/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21244 **[Test build #90261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90261/testReport)** for PR 21244 at commit [`fe769df`](https://github.com/apache/spark/commit/fe769df4317ef7c0e1b060a7064cc9d1ad9ed806). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21244 cc @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21244 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2954/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21244 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24185][SparkR][SQL]add flatten function to SparkR
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21244 **[Test build #90261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90261/testReport)** for PR 21244 at commit [`fe769df`](https://github.com/apache/spark/commit/fe769df4317ef7c0e1b060a7064cc9d1ad9ed806). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21244: [SPARK-24185][SparkR][SQL]add flatten function to...
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/21244#discussion_r186275756 --- Diff: R/pkg/R/generics.R --- @@ -918,6 +918,10 @@ setGeneric("explode_outer", function(x) { standardGeneric("explode_outer") }) #' @name NULL setGeneric("expr", function(x) { standardGeneric("expr") }) +#' @rdname column_collection_functions +#' @name NULL +setGeneric("flatten", function(x, value) { standardGeneric("flatten") }) --- End diff -- Thanks for catching the problem. Will correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21182: [SPARK-24068] Propagating DataFrameReader's optio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21182#discussion_r186274691 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -189,4 +191,6 @@ class CSVOptions( settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER) settings } + + val textOptions = ListMap(parameters.toList: _*) --- End diff -- Follow `JSONOptions` to make it `@transient` and use `parameters` instead of creating a new map? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21182: [SPARK-24068] Propagating DataFrameReader's optio...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21182#discussion_r186275033 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -136,4 +136,6 @@ private[sql] class JSONOptions( allowBackslashEscapingAnyCharacter) factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars) } + + @transient val textOptions = parameters --- End diff -- Seems to me it's less possibly we have new options which are not passed by `parameters`. If you think it can be, I'm fine with this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21228#discussion_r186274561 --- Diff: python/pyspark/sql/functions.py --- @@ -152,13 +152,19 @@ def _(): _collect_list_doc = """ Aggregate function: returns a list of objects with duplicates. +.. note:: The function is non-deterministic because the order of collected results depends +on order of rows which may be non-deterministic after a shuffle. --- End diff -- I feel that non-deterministic here is different with other non-deterministic like `monotonically_increasing_id` or `uuid`. Should we just say `The order of collected results is non-deterministic and depends on order of rows which may be non-deterministic after a shuffle`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2953/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21193 **[Test build #90260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90260/testReport)** for PR 21193 at commit [`2b30654`](https://github.com/apache/spark/commit/2b30654bc50a39a8af597df68ba288280299defe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21244: [SPARK-24815][SparkR][SQL]add flatten function to SparkR
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21244 @huaxingao Isn't the correct Jira number [SPARK-24185](https://issues.apache.org/jira/browse/SPARK-24185)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21248: [SPARK-24191]Example code for Power Iteration Clustering
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21248 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21248: [SPARK-24191]Example code for Power Iteration Clustering
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21248 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21248: Example code for Power Iteration Clustering
GitHub user shahidki31 opened a pull request: https://github.com/apache/spark/pull/21248 Example code for Power Iteration Clustering ## What changes were proposed in this pull request? Added example code for Power Iteration Clustering in Spark ML examples ## How was this patch tested? Locally ran Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shahidki31/spark sparkCommit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21248.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21248 commit 16f0e22c844a281ab4b0ecc2e70f858745db398c Author: Shahid Date: 2018-05-05T20:58:33Z Example code for Power Iteration Clustering --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21228 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90258/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21228 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21228 **[Test build #90258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90258/testReport)** for PR 21228 at commit [`26e0a22`](https://github.com/apache/spark/commit/26e0a22c32d7d7f85d2e9ba6fc58c4d15f1babc0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21247: [SPARK-24190] Separating JSONOptions for read
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21247 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90257/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21247: [SPARK-24190] Separating JSONOptions for read
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21247 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21247: [SPARK-24190] Separating JSONOptions for read
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21247 **[Test build #90257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90257/testReport)** for PR 21247 at commit [`7ea7dec`](https://github.com/apache/spark/commit/7ea7dec0a3c5f6694a6a7ef51409a9277aeb733f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18447 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18447 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90256/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18447 **[Test build #90256 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90256/testReport)** for PR 18447 at commit [`ed52e2f`](https://github.com/apache/spark/commit/ed52e2f856f78fb2dca23b6be2f682caa0a88c81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90259/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #90259 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90259/testReport)** for PR 17086 at commit [`9e59fd5`](https://github.com/apache/spark/commit/9e59fd592e9cbe43e9fc3d5c317cd3c4e2d6ac43). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21246 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21246 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90255/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21246: [SPARK-23901][SQL] Add masking functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21246 **[Test build #90255 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90255/testReport)** for PR 21246 at commit [`b15bbf1`](https://github.com/apache/spark/commit/b15bbf1ef02c5c4c4cc4c739826da894898d5771). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14083 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90252/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14083: [SPARK-16406][SQL] Improve performance of LogicalPlan.re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14083 **[Test build #90252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90252/testReport)** for PR 14083 at commit [`cbc164d`](https://github.com/apache/spark/commit/cbc164db9d05c8c4fc90573710776c592f95ed8a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90253/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21231 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21231 **[Test build #90253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90253/testReport)** for PR 21231 at commit [`1060a66`](https://github.com/apache/spark/commit/1060a666463426f2f70f2fe0a4fa7e2b66f22c67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2952/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org