[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22274 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22274 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2725/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21987: [SPARK-25015][BUILD] Update Hadoop 2.7 to 2.7.7
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21987 It seems that this change caused permission issue: ``` export HADOOP_PROXY_USER=user_a spark-sql ``` It will create dir `/tmp/hive-$%7Buser.name%7D/user_a/`. then change to other user: ``` export HADOOP_PROXY_USER=user_b spark-sql ``` exception: ```scala Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=user_b, access=EXECUTE, inode="/tmp/hive-$%7Buser.name%7D/user_b/6b446017-a880-4f23-a8d0-b62f37d3c413":user_a:hadoop:drwx-- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108) ``` I'll do verification later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22274 **[Test build #95520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95520/testReport)** for PR 22274 at commit [`4b6cd9f`](https://github.com/apache/spark/commit/4b6cd9f532e07f08c86659dcd4a0f2d40995d8ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22270 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22270 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95516/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22270 **[Test build #95516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95516/testReport)** for PR 22270 at commit [`53f4984`](https://github.com/apache/spark/commit/53f4984bd35d07da7382866960279233aadebea5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21721 @arunmahadevan, feel free to pick up the commits in my PR in your followup if they have to be changed. I will close mine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @rxin its for streaming sources and sinks as explained in the [doc]( https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/CustomMetrics.java#L23) It had to be shared between classes in reader.streaming and writer.streaming, so was added in the parent package (similar to other streaming specific classes that exists here like [StreamingWriteSupportProvider.java ](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java) [MicroBatchReadSupportProvider.java](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/MicroBatchReadSupportProvider.java)) we could move all of it to a streaming package. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22274#discussion_r214246976 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", { expect_equal(currentDatabase(), "default") expect_error(setCurrentDatabase("default"), NA) expect_error(setCurrentDatabase("zxwtyswklpf"), -"Error in setCurrentDatabase : analysis error - Database 'zxwtyswklpf' does not exist") + paste("Error in setCurrentDatabase : analysis error - Database", --- End diff -- @felixcheung Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22183: [SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22183 As discussed in the JIRA, this is a partial fix, and we need to backport another 2 PRs, which is risky. Can we close it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 I'm confused by this api. Is this for streaming only? If yes, why are they not in the stream package? If not, I only found streaming implementation. Maybe I missed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214246268 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -130,6 +134,12 @@ class RowBasedHashMapGenerator( } }.mkString(";\n") +val nullByteWriter = if (groupingKeySchema.map(_.nullable).forall(_ == false)) { --- End diff -- maybe name it `resetNullBits`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214246211 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -48,6 +48,12 @@ class RowBasedHashMapGenerator( val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema) val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema) +val numVarLenFields = groupingKeys.map(_.dataType).count { --- End diff -- groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt)) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214245829 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2546,15 +2546,37 @@ object functions { def soundex(e: Column): Column = withExpr { SoundEx(e.expr) } /** - * Splits str around pattern (pattern is a regular expression). + * Splits str around matches of the given regex. * - * @note Pattern is a string representation of the regular expression. + * @param str a string expression to split + * @param regex a string representing a regular expression. The regex string should be + * a Java regular expression. * * @group string_funcs * @since 1.5.0 */ - def split(str: Column, pattern: String): Column = withExpr { -StringSplit(str.expr, lit(pattern).expr) + def split(str: Column, regex: String): Column = withExpr { +StringSplit(str.expr, Literal(regex), Literal(-1)) + } + + /** + * Splits str around matches of the given regex. + * + * @param str a string expression to split + * @param regex a string representing a regular expression. The regex string should be + * a Java regular expression. + * @param limit an integer expression which controls the number of times the regex is applied. + *limit greater than 0: The resulting array's length will not be more than `limit`, + * and the resulting array's last entry will contain all input beyond + * the last matched regex. + *limit less than or equal to 0: `regex` will be applied as many times as possible, and + * the resulting array can be of any size. --- End diff -- Indentation here looks a bit odd and looks inconsistent at least. Can you double check Scaladoc and format this correctly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214245703 --- Diff: python/pyspark/sql/functions.py --- @@ -1669,20 +1669,36 @@ def repeat(col, n): return Column(sc._jvm.functions.repeat(_to_java_column(col), n)) -@since(1.5) +@since(2.4) @ignore_unicode_prefix -def split(str, pattern): -""" -Splits str around pattern (pattern is a regular expression). - -.. note:: pattern is a string represent the regular expression. - ->>> df = spark.createDataFrame([('ab12cd',)], ['s',]) ->>> df.select(split(df.s, '[0-9]+').alias('s')).collect() -[Row(s=[u'ab', u'cd'])] -""" -sc = SparkContext._active_spark_context -return Column(sc._jvm.functions.split(_to_java_column(str), pattern)) +def split(str, regex, limit=-1): --- End diff -- Please change `regex ` back to `pattern` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 Stuff like this merits api discussions. Not just implementation changes ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21721 I actually thought those all of them are part of DataSource V2. Why are we fine with changing those interfaces but not okay with this one and we consider reverting it? Other things should be clarified if there are some concerns, yea of course. In this case, switching it to `Unstable` looks alleviating the concerns listed here enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214244981 --- Diff: python/pyspark/sql/functions.py --- @@ -1669,20 +1669,36 @@ def repeat(col, n): return Column(sc._jvm.functions.repeat(_to_java_column(col), n)) -@since(1.5) +@since(2.4) @ignore_unicode_prefix -def split(str, pattern): -""" -Splits str around pattern (pattern is a regular expression). - -.. note:: pattern is a string represent the regular expression. - ->>> df = spark.createDataFrame([('ab12cd',)], ['s',]) ->>> df.select(split(df.s, '[0-9]+').alias('s')).collect() -[Row(s=[u'ab', u'cd'])] -""" -sc = SparkContext._active_spark_context -return Column(sc._jvm.functions.split(_to_java_column(str), pattern)) +def split(str, regex, limit=-1): --- End diff -- this would be a breaking API change I believe for python --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214244918 --- Diff: R/pkg/R/functions.R --- @@ -3410,13 +3410,14 @@ setMethod("collect_set", #' \dontrun{ #' head(select(df, split_string(df$Sex, "a"))) #' head(select(df, split_string(df$Class, "\\d"))) +#' head(select(df, split_string(df$Class, "\\d", 2))) #' # This is equivalent to the following SQL expression #' head(selectExpr(df, "split(Class, 'd')"))} #' @note split_string 2.3.0 setMethod("split_string", signature(x = "Column", pattern = "character"), - function(x, pattern) { -jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern) + function(x, pattern, limit = -1) { +jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern, limit) --- End diff -- you should have `as.integer(limit)` instead could we add a test in R? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2724/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2724/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/22213#discussion_r214244665 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -1144,6 +1144,46 @@ class SparkSubmitSuite conf1.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}") conf1.get("spark.submit.pyFiles") should (startWith("/")) } + + test("handles natural line delimiters in --properties-file and --conf uniformly") { +val delimKey = "spark.my.delimiter." +val LF = "\n" +val CR = "\r" + +val leadingDelimKeyFromFile = s"${delimKey}leadingDelimKeyFromFile" -> s"${LF}blah" +val trailingDelimKeyFromFile = s"${delimKey}trailingDelimKeyFromFile" -> s"blah${CR}" +val infixDelimFromFile = s"${delimKey}infixDelimFromFile" -> s"${CR}blah${LF}" +val nonDelimSpaceFromFile = s"${delimKey}nonDelimSpaceFromFile" -> " blah\f" --- End diff -- Sorry for the stupid question. I guess I was thinking of something different. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95519/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 **[Test build #95519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95519/testReport)** for PR 22298 at commit [`46c30cc`](https://github.com/apache/spark/commit/46c30cc27cd3a7279a116ec6a70a937b8502cd73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22274#discussion_r214244580 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", { expect_equal(currentDatabase(), "default") expect_error(setCurrentDatabase("default"), NA) expect_error(setCurrentDatabase("zxwtyswklpf"), -"Error in setCurrentDatabase : analysis error - Database 'zxwtyswklpf' does not exist") + paste("Error in setCurrentDatabase : analysis error - Database", --- End diff -- I'd use paste0 instead to make clear about the implicit space that should be after `Database` ie. `paste0("Error in setCurrentDatabase : analysis error - Database ", "'zxwtyswklpf' does not exist")) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95503/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #95503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95503/testReport)** for PR 22192 at commit [`2907c6b`](https://github.com/apache/spark/commit/2907c6b62495f8d25c0016883202239634685fec). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22281 For clarification, I am okay with targeting this to 3.0.0 since the code freeze will be very soon if I am not mistaken. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22291: [SPARK-25007][R]Add array_intersect/array_except/...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22291#discussion_r214244359 --- Diff: R/pkg/R/generics.R --- @@ -799,10 +807,18 @@ setGeneric("array_sort", function(x) { standardGeneric("array_sort") }) #' @name NULL setGeneric("arrays_overlap", function(x, y) { standardGeneric("arrays_overlap") }) +#' @rdname column_collection_functions +#' @name NULL +setGeneric("array_union", function(x, y) { standardGeneric("array_union") }) + #' @rdname column_collection_functions #' @name NULL setGeneric("arrays_zip", function(x, ...) { standardGeneric("arrays_zip") }) +#' @rdname column_collection_functions +#' @name NULL +setGeneric("shuffle", function(x) { standardGeneric("shuffle") }) --- End diff -- this should go below - this part of the list should be sorted alphabetically --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22048 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20637 with the test removed, do we still need this change? https://github.com/apache/spark/pull/20637/files#diff-41747ec3f56901eb7bfb95d2a217e94dR226 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22281 Yea, but the default fallback should rather be DataSource V2's. Both of you are super active in DataSource V2. Do you guys have some concerns about defaulting to DataSource V1's behaviour? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2724/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r214243817 --- Diff: R/pkg/R/functions.R --- @@ -1697,8 +1697,8 @@ setMethod("to_date", }) #' @details -#' \code{to_json}: Converts a column containing a \code{structType}, array of \code{structType}, -#' a \code{mapType} or array of \code{mapType} into a Column of JSON string. +#' \code{to_json}: Converts a column containing a \code{structType}, a \code{mapType} +#' or an array into a Column of JSON string. --- End diff -- Let's add one simple python doctest as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22281 USING syntax has to be there, but what can USING maybe only data source v1 and file format. IIUC the agreement is: a data source v2 with catalog can create a table with USING, and the data source should interpret the USING parameter. e.g. `USING parquet` may have a different meaning in iceberg data source. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/22298#discussion_r214243652 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/SecretsTestsSuite.scala --- @@ -53,6 +53,7 @@ private[spark] trait SecretsTestsSuite { k8sSuite: KubernetesSuite => .delete() } + // TODO: [SPARK-25291] This test is flaky with regards to memory of executors --- End diff -- @mccheah This test periodically fails on setting proper memory for executors on this specific test. I have filed a JIRA: SPARK-25291 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/22298 @rdblue @holdenk for review. This contains both unit and integration tests that verify [SPARK-25004] for K8S --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...
GitHub user ifilonenko opened a pull request: https://github.com/apache/spark/pull/22298 [SPARK-25021][K8S] Add spark.executor.pyspark.memory limit for K8S ## What changes were proposed in this pull request? Add spark.executor.pyspark.memory limit for K8S ## How was this patch tested? Unit and Integration tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/ifilonenko/spark SPARK-25021 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22298.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22298 commit b54a039da08aec93a6db9d1470d0b2eaaec08814 Author: Ilan Filonenko Date: 2018-08-30T00:19:40Z initial WIP push for SPARK-25021 commit 75742a37687a7eb3ebaa34069ac7a62521a4e2f8 Author: Ilan Filonenko Date: 2018-08-30T05:26:27Z add python.worker.reuse commit 46c30cc27cd3a7279a116ec6a70a937b8502cd73 Author: Ilan Filonenko Date: 2018-08-31T04:32:22Z final checks with e2e tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21721 Note that, data source v2 API is not stable yet and we may even change the abstraction of the APIs. The design of custom metrics may affect the design of the streaming source APIs. I had a hard time to figure out the life cycle of custom metrics. It seems like its life cycle should be bound to an epoch, but unfortunately we don't have such an interface in continuous streaming to represent an epoch. Is it possible that we may end up with 2 sets of custom metrics APIs for micro-batch and continuous? The documentation added in this PR is not clear about this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r214243115 --- Diff: R/pkg/R/functions.R --- @@ -1697,8 +1697,8 @@ setMethod("to_date", }) #' @details -#' \code{to_json}: Converts a column containing a \code{structType}, array of \code{structType}, -#' a \code{mapType} or array of \code{mapType} into a Column of JSON string. +#' \code{to_json}: Converts a column containing a \code{structType}, a \code{mapType} +#' or an array into a Column of JSON string. --- End diff -- it should could we add some tests for this in R? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22232 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95508/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22232 **[Test build #95508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95508/testReport)** for PR 22232 at commit [`1c32646`](https://github.com/apache/spark/commit/1c326466fbd24c432184be6e53afec93369970c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 > The only tricky thing is, Product is handled specially in the top level, being flattened into multiple columns. @cloud-fan Compared with Option of Product which is not supported before, the encoding of Product is current behavior. I think we don't need to change it so far. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95511/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/7 **[Test build #95511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95511/testReport)** for PR 7 at commit [`a641106`](https://github.com/apache/spark/commit/a6411069c352b30f9094a83991c35f0730b5df55). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22186 **[Test build #95518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)** for PR 22186 at commit [`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95518/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...
Github user sadhen commented on the issue: https://github.com/apache/spark/pull/22264 @srowen A PR for this "bug" is proposed: https://github.com/scala/scala/pull/7156 Hopefully, Scala 2.12.7 will fix it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20086: [SPARK-22903]Fix already being created exception in stag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20086 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22264 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r214237818 --- Diff: python/pyspark/sql/session.py --- @@ -252,6 +252,16 @@ def newSession(self): """ return self.__class__(self._sc, self._jsparkSession.newSession()) +@since(2.4) +def getActiveSession(self): +""" +Returns the active SparkSession for the current thread, returned by the builder. +>>> s = spark.getActiveSession() +>>> spark._jsparkSession.getDefaultSession().get().equals(s.get()) +True +""" +return self._jsparkSession.getActiveSession() --- End diff -- Does this return JVM instance? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/22213#discussion_r214237801 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -1144,6 +1144,46 @@ class SparkSubmitSuite conf1.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}") conf1.get("spark.submit.pyFiles") should (startWith("/")) } + + test("handles natural line delimiters in --properties-file and --conf uniformly") { +val delimKey = "spark.my.delimiter." +val LF = "\n" +val CR = "\r" + +val leadingDelimKeyFromFile = s"${delimKey}leadingDelimKeyFromFile" -> s"${LF}blah" +val trailingDelimKeyFromFile = s"${delimKey}trailingDelimKeyFromFile" -> s"blah${CR}" +val infixDelimFromFile = s"${delimKey}infixDelimFromFile" -> s"${CR}blah${LF}" +val nonDelimSpaceFromFile = s"${delimKey}nonDelimSpaceFromFile" -> " blah\f" --- End diff -- @jerryshao I try not to spend time on issues unrelated to our production deployments. @steveloughran and this PR already pointed at the `Properties#load` method documenting the format. Line terminator characters can be included using `\r` and `\n` escape sequences. Or you can encode any character using `\u` In addition you can take a look at the file generated by this code: ``` #test whitespace #Thu Aug 30 20:20:33 PDT 2018 spark.my.delimiter.nonDelimSpaceFromFile=\ blah\f spark.my.delimiter.infixDelimFromFile=\rblah\n spark.my.delimiter.trailingDelimKeyFromFile=blah\r spark.my.delimiter.leadingDelimKeyFromFile=\nblah ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22273 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95514/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22186 **[Test build #95518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)** for PR 22186 at commit [`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22273 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22197 **[Test build #95517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95517/testReport)** for PR 22197 at commit [`e0d6196`](https://github.com/apache/spark/commit/e0d61969b13bcfd9dfc95e2a013b14e111d2b832). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22273 **[Test build #95514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95514/testReport)** for PR 22273 at commit [`e8a2602`](https://github.com/apache/spark/commit/e8a2602476a52622a01c0cf4f72067f3119be96a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2723/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22297 cc @cloud-fan @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22186 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22186 I see. Thanks for the explain, I checked the code again, yes you're right. Let me retrigger the test again, will merge it if everything is fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22270 **[Test build #95516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95516/testReport)** for PR 22270 at commit [`53f4984`](https://github.com/apache/spark/commit/53f4984bd35d07da7382866960279233aadebea5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22270 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22270 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2722/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214235758 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -141,11 +151,8 @@ class RowBasedHashMapGenerator( |if (buckets[idx] == -1) { | if (numRows < capacity && !isBatchFull) { |// creating the unsafe for new entry - | org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter - | = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter( - | ${groupingKeySchema.length}, ${numVarLenFields * 32}); |agg_rowWriter.reset(); //TODO: investigate if reset or zeroout are actually needed --- End diff -- I think now reset and zero out is needed? So maybe remove this TODO? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214235660 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -141,11 +151,8 @@ class RowBasedHashMapGenerator( |if (buckets[idx] == -1) { | if (numRows < capacity && !isBatchFull) { |// creating the unsafe for new entry --- End diff -- Remove or update this comment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2721/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22297 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22297 **[Test build #95515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95515/testReport)** for PR 22297 at commit [`cc7a710`](https://github.com/apache/spark/commit/cc7a710a1ba8d050836f64d820f675546712b3c9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22297: [SPARK-25290][Core][Test] Reduce the size of acqu...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22297 [SPARK-25290][Core][Test] Reduce the size of acquired arrays to avoid OOM error ## What changes were proposed in this pull request? `BytesToBytesMapOnHeapSuite`.`randomizedStressTest` caused `OutOfMemoryError` on several test runs. Seems better to reduce memory usage in this test. ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-25290 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22297.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22297 commit cc7a710a1ba8d050836f64d820f675546712b3c9 Author: Liang-Chi Hsieh Date: 2018-08-31T02:59:18Z Reduce the size of acquired arrays to avoid OOM error. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/21860 cc @cloud-fan @maropu @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22279: [SPARK-25277][YARN] YARN applicationMaster metric...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/22279#discussion_r214234325 --- Diff: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala --- @@ -103,6 +103,14 @@ private[spark] class MetricsSystem private ( sinks.foreach(_.start) } + // Same as start but this method only registers sinks --- End diff -- Would you please explain why only registering sinks could solve the problem here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory operat...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/21968 cc @cloud-fan @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22279: [SPARK-25277][YARN] YARN applicationMaster metrics shoul...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22279 Hi @LucaCanali do you have an output current AM metrics? I would like to know what kind of metrics will be output for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...
Github user xuejianbest commented on the issue: https://github.com/apache/spark/pull/22048 I see. A new commit has been done. Thinks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22197 Seems fine to me too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r214233946 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -44,7 +45,14 @@ private[parquet] class ParquetFilters( pushDownTimestamp: Boolean, pushDownDecimal: Boolean, pushDownStartWith: Boolean, -pushDownInFilterThreshold: Int) { +pushDownInFilterThreshold: Int, +caseSensitive: Boolean) { + + private case class ParquetField( + // field name in parquet file --- End diff -- I'd just move those into the doc for this case class above, for instance, ``` /** * blabla * @param blabla */ private case class ParquetField ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22273 > I thought the current information is enough to indicate which Arrow or Pandas we would use and test Well yeah, it is when they are skipped but my point was that having an additional positive confirmation that the tests were run would be nice. Maybe that's just me though, so we don't have to merge this. I am still a bit concerned why my skip message was only printed in 1 of the tests here, so I'll run a few more to see if I can figure it out. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22289: [SPARK-25200][YARN] Allow specifying HADOOP_CONF_...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/22289#discussion_r214233802 --- Diff: launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java --- @@ -200,6 +200,7 @@ void addOptionString(List cmd, String options) { addToClassPath(cp, getenv("HADOOP_CONF_DIR")); addToClassPath(cp, getenv("YARN_CONF_DIR")); +addToClassPath(cp, getEffectiveConfig().get("spark.yarn.conf.dir")); --- End diff -- I'm wondering how do we update the classpath to change to another hadoop confs with InProcessLauncher? Seems the classpath here is not changeable after JVM is launched. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22273 **[Test build #95514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95514/testReport)** for PR 22273 at commit [`e8a2602`](https://github.com/apache/spark/commit/e8a2602476a52622a01c0cf4f72067f3119be96a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22273 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2720/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22273 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22264 Yeah, OK. I think this is acceptable as a potential "known issue" for Scala 2.12 support, which we can accept for a beta release of 2.12 support with Spark 2.4. I think I'd merge this and then see where we are. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22294: [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22294 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22294: [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22294 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95498/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...
Github user gerashegalov commented on the issue: https://github.com/apache/spark/pull/22213 @steveloughran Regarding XML format, java.util.Properties has its dedicated storeTo/loadFromXML methods which Spark does not use, so we don't need to check this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22294: [SPARK-25287][INFRA] Add up-front check for JIRA_USERNAM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22294 **[Test build #95498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95498/testReport)** for PR 22294 at commit [`1ed41dd`](https://github.com/apache/spark/commit/1ed41ddc922cd07f6d6c2384c5aa248699f9ef87). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22138 **[Test build #95513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95513/testReport)** for PR 22138 at commit [`017c0bb`](https://github.com/apache/spark/commit/017c0bbf9365b32467de64c96a1a0d6aee1f6875). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22296 **[Test build #95512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95512/testReport)** for PR 22296 at commit [`a847099`](https://github.com/apache/spark/commit/a8470991ba73eb959c0e7dbda31e5d391c2d34ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22296 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22296 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2719/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22173: [SPARK-24335] Spark external shuffle server improvement ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22173 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95499/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org