[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22299 Thanks, @jerryshao for pointing this out. I will close mine after we see what we want. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21721 Can someone write a design doc for the metrics support? I think this is an important feature for data source v2 and we need to be careful here. The design doc should explain how custom metrics fit in the abstraction of data source v2 API, how the metrics API would look like for batch, micro-batch and continuous (I feel metrics is also important for batch sources), and how the sources report metrics physically (via task complete event? via heartbeat? via RPC?). @rxin just sent an email to the dev list about the data source v2 API abstraction, it would be great if you guys can kick it and talk about the metrics support. It's very likely that the custom metrics API would be replaced by something totally different after we finish the design. I don't think we should rush into something that works but not well designed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22296 Let me leave this open in case we only want to mark this as unstable for now. Other changes are proposed in https://github.com/apache/spark/pull/22296 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...
GitHub user HyukjinKwon reopened a pull request: https://github.com/apache/spark/pull/22296 [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Unstable APIs ## What changes were proposed in this pull request? This PR proposes to switch the api stability from `Evolving` to `Unstable` given the discussion in the original PR for now. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-24748 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22296.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22296 commit a8470991ba73eb959c0e7dbda31e5d391c2d34ef Author: hyukjinkwon Date: 2018-08-31T02:29:30Z Switch custom metrics to Unstable APIs --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/22296 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21721 I skimmed about how AccumulatorV2 works, and looks like the values in a task are reported along with CompletionEvent which is triggered when a task ends. Then in continuous mode driver even doesn't have updated metrics. It should not couple with lifecycle of task. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22296 I am closing this per https://github.com/apache/spark/pull/22296 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22299 Seems there's another similar PR #22296 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22186 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22277: [SPARK-25276] Redundant constrains when using alias
Github user ajithme commented on the issue: https://github.com/apache/spark/pull/22277 @gatorsmile and @jiangxb1987 any inputs.? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22274 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22274 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95520/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22274 **[Test build #95520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95520/testReport)** for PR 22274 at commit [`4b6cd9f`](https://github.com/apache/spark/commit/4b6cd9f532e07f08c86659dcd4a0f2d40995d8ef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20637 I believe we still need this change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 >It seems like its life cycle should be bound to an epoch, but unfortunately we don't have such an interface in continuous streaming to represent an epoch. Is it possible that we may end up with 2 sets of custom metrics APIs for micro-batch and continuous? @cloud-fan we could still report progress at the end of each epoch (e.g. [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala#L231) and via the EpochCordinator). There need not be separate interfaces for the progress or the custom metrics, just the mechanisms could be different. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22186 Merging to master branch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22296 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95512/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22296 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18877: [SPARK-17742][core] Handle child process exit in SparkLa...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18877 yes @danelkotev `asfgit closed this in cba826d on Aug 15, 2017` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21721 My 2 cents, the root reason is the lifecycle of reporting query progress is tied to `finishTrigger` and we read updated metrics from executed plan which continuous mode doesn't have both `finishTrigger` as well as finished plan to be executed. I'm not aware of how/when updated information of nodes of physical plan are transmitted from executor to the driver, but we should avoid using executed plan as a source to read information, and find alternative to be compatible between micro-batch and continuous mode. It doesn't apply only metrics but also watermarks. I'm not sure it is viable, but It could be via RPC or whatever once we can aggregate the information from driver. Then each operators can send information on driver directly and driver can aggregate them and utilize once a batch or an epoch is finished. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22293: [SPARK-25288][Tests]Fix flaky Kafka transaction t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22293 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22296: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22296 **[Test build #95512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95512/testReport)** for PR 22296 at commit [`a847099`](https://github.com/apache/spark/commit/a8470991ba73eb959c0e7dbda31e5d391c2d34ef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214253824 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,32 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, --- End diff -- ah it's called in the generated code. Maybe we can use elementSize `-1` to create a generic array. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22293: [SPARK-25288][Tests]Fix flaky Kafka transaction tests
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/22293 Thanks! Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214253479 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala --- @@ -34,6 +36,32 @@ object ArrayData { case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a) case other => new GenericArrayData(other) } + + + /** + * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given parameters. + * + * @param elementSize a size of an element in bytes + * @param numElements the number of elements the array should contain + * @param isPrimitiveType whether the type of an element is primitive type + * @param additionalErrorMessage string to include in the error message + */ + def allocateArrayData( + elementSize: Int, --- End diff -- `elementSize` is only used when creating unsafe array. I think we just have a `elementSize: Option[Int]` and remove the `isPrimitiveType` parameter. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22299 **[Test build #95521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95521/testReport)** for PR 22299 at commit [`49a94c6`](https://github.com/apache/spark/commit/49a94c6016a0a4cd6076329797f4c2ac5a9cb588). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22299 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21912#discussion_r214253006 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -452,6 +452,16 @@ public UnsafeArrayData copy() { public static UnsafeArrayData fromPrimitiveArray( Object arr, int offset, int length, int elementSize) { +UnsafeArrayData result = createFreshArray(length, elementSize); +final long headerInBytes = calculateHeaderPortionInBytes(length); +final long valueRegionInBytes = (long)elementSize * length; +final Object data = result.getBaseObject(); --- End diff -- now the data is `Object` instead of `long[]`. Can we duplicate the code for now and think of how to deduplicate them later? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 I created a follow up PR to move CustomMetrics (and a few other streaming specific interfaces in that package) to 'streaming' and mark the interfaces as Unstable here - https://github.com/apache/spark/pull/22299 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Uns...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22299 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22299: [SPARK-24748][SS][FOLLOWUP] Switch custom metrics...
GitHub user arunmahadevan opened a pull request: https://github.com/apache/spark/pull/22299 [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Unstable APIs - Mark custom metrics related APIs as unstable - Move CustomMetrics (and a few other streaming interfaces in parent package) to streaming package Ideally could move `v2/reader/streaming` and `v2/writer/streaming` under `streaming/reader` and `streaming/writer` but that can be a follow up PR if required. ## How was this patch tested? Existing unit tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arunmahadevan/spark refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22299.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22299 commit 49a94c6016a0a4cd6076329797f4c2ac5a9cb588 Author: Arun Mahadevan Date: 2018-08-31T05:53:57Z [SPARK-24748][SS][FOLLOWUP] Switch custom metrics to Unstable APIs - Mark custom metrics related APIs as unstable - Move streaming related interfaces to streaming package --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22297 do we have a memory leak here? It seems these arrays are allocated in the loop and can be released soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...
Github user npoberezkin commented on the issue: https://github.com/apache/spark/pull/22255 I got your idea now. Apparently I was a little confused because of the description of tickets. I can try to implement these (writing info about writer.model like "avro" etc in Spark), if you give me some directions on how can i do it and where should i make changes. Also I can add "spark.version" property, but if I got everything right, we'll need to open new issue in parquet to do this, am I right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22274 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22274 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2725/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21987: [SPARK-25015][BUILD] Update Hadoop 2.7 to 2.7.7
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21987 It seems that this change caused permission issue: ``` export HADOOP_PROXY_USER=user_a spark-sql ``` It will create dir `/tmp/hive-$%7Buser.name%7D/user_a/`. then change to other user: ``` export HADOOP_PROXY_USER=user_b spark-sql ``` exception: ```scala Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=user_b, access=EXECUTE, inode="/tmp/hive-$%7Buser.name%7D/user_b/6b446017-a880-4f23-a8d0-b62f37d3c413":user_a:hadoop:drwx-- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1780) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108) ``` I'll do verification later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22274 **[Test build #95520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95520/testReport)** for PR 22274 at commit [`4b6cd9f`](https://github.com/apache/spark/commit/4b6cd9f532e07f08c86659dcd4a0f2d40995d8ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22270 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22270 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95516/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22270: [SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22270 **[Test build #95516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95516/testReport)** for PR 22270 at commit [`53f4984`](https://github.com/apache/spark/commit/53f4984bd35d07da7382866960279233aadebea5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21721 @arunmahadevan, feel free to pick up the commits in my PR in your followup if they have to be changed. I will close mine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21721 @rxin its for streaming sources and sinks as explained in the [doc]( https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/CustomMetrics.java#L23) It had to be shared between classes in reader.streaming and writer.streaming, so was added in the parent package (similar to other streaming specific classes that exists here like [StreamingWriteSupportProvider.java ](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java) [MicroBatchReadSupportProvider.java](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/sources/v2/MicroBatchReadSupportProvider.java)) we could move all of it to a streaming package. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22274#discussion_r214246976 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", { expect_equal(currentDatabase(), "default") expect_error(setCurrentDatabase("default"), NA) expect_error(setCurrentDatabase("zxwtyswklpf"), -"Error in setCurrentDatabase : analysis error - Database 'zxwtyswklpf' does not exist") + paste("Error in setCurrentDatabase : analysis error - Database", --- End diff -- @felixcheung Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22183: [SPARK-25132][SQL][BACKPORT-2.3] Case-insensitive field ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22183 As discussed in the JIRA, this is a partial fix, and we need to backport another 2 PRs, which is risky. Can we close it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 I'm confused by this api. Is this for streaming only? If yes, why are they not in the stream package? If not, I only found streaming implementation. Maybe I missed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214246268 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -130,6 +134,12 @@ class RowBasedHashMapGenerator( } }.mkString(";\n") +val nullByteWriter = if (groupingKeySchema.map(_.nullable).forall(_ == false)) { --- End diff -- maybe name it `resetNullBits`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21968: [SPARK-24999][SQL]Reduce unnecessary 'new' memory...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21968#discussion_r214246211 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/RowBasedHashMapGenerator.scala --- @@ -48,6 +48,12 @@ class RowBasedHashMapGenerator( val keySchema = ctx.addReferenceObj("keySchemaTerm", groupingKeySchema) val valueSchema = ctx.addReferenceObj("valueSchemaTerm", bufferSchema) +val numVarLenFields = groupingKeys.map(_.dataType).count { --- End diff -- groupingKeys.map(_.dataType).count(dt => !UnsafeRow.isFixedLength(dt)) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214245829 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2546,15 +2546,37 @@ object functions { def soundex(e: Column): Column = withExpr { SoundEx(e.expr) } /** - * Splits str around pattern (pattern is a regular expression). + * Splits str around matches of the given regex. * - * @note Pattern is a string representation of the regular expression. + * @param str a string expression to split + * @param regex a string representing a regular expression. The regex string should be + * a Java regular expression. * * @group string_funcs * @since 1.5.0 */ - def split(str: Column, pattern: String): Column = withExpr { -StringSplit(str.expr, lit(pattern).expr) + def split(str: Column, regex: String): Column = withExpr { +StringSplit(str.expr, Literal(regex), Literal(-1)) + } + + /** + * Splits str around matches of the given regex. + * + * @param str a string expression to split + * @param regex a string representing a regular expression. The regex string should be + * a Java regular expression. + * @param limit an integer expression which controls the number of times the regex is applied. + *limit greater than 0: The resulting array's length will not be more than `limit`, + * and the resulting array's last entry will contain all input beyond + * the last matched regex. + *limit less than or equal to 0: `regex` will be applied as many times as possible, and + * the resulting array can be of any size. --- End diff -- Indentation here looks a bit odd and looks inconsistent at least. Can you double check Scaladoc and format this correctly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214245703 --- Diff: python/pyspark/sql/functions.py --- @@ -1669,20 +1669,36 @@ def repeat(col, n): return Column(sc._jvm.functions.repeat(_to_java_column(col), n)) -@since(1.5) +@since(2.4) @ignore_unicode_prefix -def split(str, pattern): -""" -Splits str around pattern (pattern is a regular expression). - -.. note:: pattern is a string represent the regular expression. - ->>> df = spark.createDataFrame([('ab12cd',)], ['s',]) ->>> df.select(split(df.s, '[0-9]+').alias('s')).collect() -[Row(s=[u'ab', u'cd'])] -""" -sc = SparkContext._active_spark_context -return Column(sc._jvm.functions.split(_to_java_column(str), pattern)) +def split(str, regex, limit=-1): --- End diff -- Please change `regex ` back to `pattern` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 Stuff like this merits api discussions. Not just implementation changes ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21721 I actually thought those all of them are part of DataSource V2. Why are we fine with changing those interfaces but not okay with this one and we consider reverting it? Other things should be clarified if there are some concerns, yea of course. In this case, switching it to `Unstable` looks alleviating the concerns listed here enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214244981 --- Diff: python/pyspark/sql/functions.py --- @@ -1669,20 +1669,36 @@ def repeat(col, n): return Column(sc._jvm.functions.repeat(_to_java_column(col), n)) -@since(1.5) +@since(2.4) @ignore_unicode_prefix -def split(str, pattern): -""" -Splits str around pattern (pattern is a regular expression). - -.. note:: pattern is a string represent the regular expression. - ->>> df = spark.createDataFrame([('ab12cd',)], ['s',]) ->>> df.select(split(df.s, '[0-9]+').alias('s')).collect() -[Row(s=[u'ab', u'cd'])] -""" -sc = SparkContext._active_spark_context -return Column(sc._jvm.functions.split(_to_java_column(str), pattern)) +def split(str, regex, limit=-1): --- End diff -- this would be a breaking API change I believe for python --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214244918 --- Diff: R/pkg/R/functions.R --- @@ -3410,13 +3410,14 @@ setMethod("collect_set", #' \dontrun{ #' head(select(df, split_string(df$Sex, "a"))) #' head(select(df, split_string(df$Class, "\\d"))) +#' head(select(df, split_string(df$Class, "\\d", 2))) #' # This is equivalent to the following SQL expression #' head(selectExpr(df, "split(Class, 'd')"))} #' @note split_string 2.3.0 setMethod("split_string", signature(x = "Column", pattern = "character"), - function(x, pattern) { -jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern) + function(x, pattern, limit = -1) { +jc <- callJStatic("org.apache.spark.sql.functions", "split", x@jc, pattern, limit) --- End diff -- you should have `as.integer(limit)` instead could we add a test in R? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2724/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2724/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/22213#discussion_r214244665 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -1144,6 +1144,46 @@ class SparkSubmitSuite conf1.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}") conf1.get("spark.submit.pyFiles") should (startWith("/")) } + + test("handles natural line delimiters in --properties-file and --conf uniformly") { +val delimKey = "spark.my.delimiter." +val LF = "\n" +val CR = "\r" + +val leadingDelimKeyFromFile = s"${delimKey}leadingDelimKeyFromFile" -> s"${LF}blah" +val trailingDelimKeyFromFile = s"${delimKey}trailingDelimKeyFromFile" -> s"blah${CR}" +val infixDelimFromFile = s"${delimKey}infixDelimFromFile" -> s"${CR}blah${LF}" +val nonDelimSpaceFromFile = s"${delimKey}nonDelimSpaceFromFile" -> " blah\f" --- End diff -- Sorry for the stupid question. I guess I was thinking of something different. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95519/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22298 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 **[Test build #95519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95519/testReport)** for PR 22298 at commit [`46c30cc`](https://github.com/apache/spark/commit/46c30cc27cd3a7279a116ec6a70a937b8502cd73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22274#discussion_r214244580 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", { expect_equal(currentDatabase(), "default") expect_error(setCurrentDatabase("default"), NA) expect_error(setCurrentDatabase("zxwtyswklpf"), -"Error in setCurrentDatabase : analysis error - Database 'zxwtyswklpf' does not exist") + paste("Error in setCurrentDatabase : analysis error - Database", --- End diff -- I'd use paste0 instead to make clear about the implicit space that should be after `Database` ie. `paste0("Error in setCurrentDatabase : analysis error - Database ", "'zxwtyswklpf' does not exist")) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22192 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95503/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22192 **[Test build #95503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95503/testReport)** for PR 22192 at commit [`2907c6b`](https://github.com/apache/spark/commit/2907c6b62495f8d25c0016883202239634685fec). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22281 For clarification, I am okay with targeting this to 3.0.0 since the code freeze will be very soon if I am not mistaken. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22291: [SPARK-25007][R]Add array_intersect/array_except/...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22291#discussion_r214244359 --- Diff: R/pkg/R/generics.R --- @@ -799,10 +807,18 @@ setGeneric("array_sort", function(x) { standardGeneric("array_sort") }) #' @name NULL setGeneric("arrays_overlap", function(x, y) { standardGeneric("arrays_overlap") }) +#' @rdname column_collection_functions +#' @name NULL +setGeneric("array_union", function(x, y) { standardGeneric("array_union") }) + #' @rdname column_collection_functions #' @name NULL setGeneric("arrays_zip", function(x, ...) { standardGeneric("arrays_zip") }) +#' @rdname column_collection_functions +#' @name NULL +setGeneric("shuffle", function(x) { standardGeneric("shuffle") }) --- End diff -- this should go below - this part of the list should be sorted alphabetically --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22048 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20637 with the test removed, do we still need this change? https://github.com/apache/spark/pull/20637/files#diff-41747ec3f56901eb7bfb95d2a217e94dR226 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22281 Yea, but the default fallback should rather be DataSource V2's. Both of you are super active in DataSource V2. Do you guys have some concerns about defaulting to DataSource V1's behaviour? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22298 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2724/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r214243817 --- Diff: R/pkg/R/functions.R --- @@ -1697,8 +1697,8 @@ setMethod("to_date", }) #' @details -#' \code{to_json}: Converts a column containing a \code{structType}, array of \code{structType}, -#' a \code{mapType} or array of \code{mapType} into a Column of JSON string. +#' \code{to_json}: Converts a column containing a \code{structType}, a \code{mapType} +#' or an array into a Column of JSON string. --- End diff -- Let's add one simple python doctest as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22281: [SPARK-25280][SQL] Add support for USING syntax for Data...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22281 USING syntax has to be there, but what can USING maybe only data source v1 and file format. IIUC the agreement is: a data source v2 with catalog can create a table with USING, and the data source should interpret the USING parameter. e.g. `USING parquet` may have a different meaning in iceberg data source. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/22298#discussion_r214243652 --- Diff: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/SecretsTestsSuite.scala --- @@ -53,6 +53,7 @@ private[spark] trait SecretsTestsSuite { k8sSuite: KubernetesSuite => .delete() } + // TODO: [SPARK-25291] This test is flaky with regards to memory of executors --- End diff -- @mccheah This test periodically fails on setting proper memory for executors on this specific test. I have filed a JIRA: SPARK-25291 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/22298 @rdblue @holdenk for review. This contains both unit and integration tests that verify [SPARK-25004] for K8S --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...
GitHub user ifilonenko opened a pull request: https://github.com/apache/spark/pull/22298 [SPARK-25021][K8S] Add spark.executor.pyspark.memory limit for K8S ## What changes were proposed in this pull request? Add spark.executor.pyspark.memory limit for K8S ## How was this patch tested? Unit and Integration tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/ifilonenko/spark SPARK-25021 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22298.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22298 commit b54a039da08aec93a6db9d1470d0b2eaaec08814 Author: Ilan Filonenko Date: 2018-08-30T00:19:40Z initial WIP push for SPARK-25021 commit 75742a37687a7eb3ebaa34069ac7a62521a4e2f8 Author: Ilan Filonenko Date: 2018-08-30T05:26:27Z add python.worker.reuse commit 46c30cc27cd3a7279a116ec6a70a937b8502cd73 Author: Ilan Filonenko Date: 2018-08-31T04:32:22Z final checks with e2e tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21721 Note that, data source v2 API is not stable yet and we may even change the abstraction of the APIs. The design of custom metrics may affect the design of the streaming source APIs. I had a hard time to figure out the life cycle of custom metrics. It seems like its life cycle should be bound to an epoch, but unfortunately we don't have such an interface in continuous streaming to represent an epoch. Is it possible that we may end up with 2 sets of custom metrics APIs for micro-batch and continuous? The documentation added in this PR is not clear about this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r214243115 --- Diff: R/pkg/R/functions.R --- @@ -1697,8 +1697,8 @@ setMethod("to_date", }) #' @details -#' \code{to_json}: Converts a column containing a \code{structType}, array of \code{structType}, -#' a \code{mapType} or array of \code{mapType} into a Column of JSON string. +#' \code{to_json}: Converts a column containing a \code{structType}, a \code{mapType} +#' or an array into a Column of JSON string. --- End diff -- it should could we add some tests for this in R? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22232 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95508/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22232 **[Test build #95508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95508/testReport)** for PR 22232 at commit [`1c32646`](https://github.com/apache/spark/commit/1c326466fbd24c432184be6e53afec93369970c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 > The only tricky thing is, Product is handled specially in the top level, being flattened into multiple columns. @cloud-fan Compared with Option of Product which is not supported before, the encoding of Product is current behavior. I think we don't need to change it so far. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95511/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/7 **[Test build #95511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95511/testReport)** for PR 7 at commit [`a641106`](https://github.com/apache/spark/commit/a6411069c352b30f9094a83991c35f0730b5df55). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95518/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22186 **[Test build #95518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)** for PR 22186 at commit [`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive te...
Github user sadhen commented on the issue: https://github.com/apache/spark/pull/22264 @srowen A PR for this "bug" is proposed: https://github.com/scala/scala/pull/7156 Hopefully, Scala 2.12.7 will fix it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20086: [SPARK-22903]Fix already being created exception in stag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20086 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22264: [SPARK-25256][SQL][TEST] Plan mismatch errors in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22264 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22295#discussion_r214237818 --- Diff: python/pyspark/sql/session.py --- @@ -252,6 +252,16 @@ def newSession(self): """ return self.__class__(self._sc, self._jsparkSession.newSession()) +@since(2.4) +def getActiveSession(self): +""" +Returns the active SparkSession for the current thread, returned by the builder. +>>> s = spark.getActiveSession() +>>> spark._jsparkSession.getDefaultSession().get().equals(s.get()) +True +""" +return self._jsparkSession.getActiveSession() --- End diff -- Does this return JVM instance? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/22213#discussion_r214237801 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -1144,6 +1144,46 @@ class SparkSubmitSuite conf1.get(PY_FILES.key) should be (s"s3a://${pyFile.getAbsolutePath}") conf1.get("spark.submit.pyFiles") should (startWith("/")) } + + test("handles natural line delimiters in --properties-file and --conf uniformly") { +val delimKey = "spark.my.delimiter." +val LF = "\n" +val CR = "\r" + +val leadingDelimKeyFromFile = s"${delimKey}leadingDelimKeyFromFile" -> s"${LF}blah" +val trailingDelimKeyFromFile = s"${delimKey}trailingDelimKeyFromFile" -> s"blah${CR}" +val infixDelimFromFile = s"${delimKey}infixDelimFromFile" -> s"${CR}blah${LF}" +val nonDelimSpaceFromFile = s"${delimKey}nonDelimSpaceFromFile" -> " blah\f" --- End diff -- @jerryshao I try not to spend time on issues unrelated to our production deployments. @steveloughran and this PR already pointed at the `Properties#load` method documenting the format. Line terminator characters can be included using `\r` and `\n` escape sequences. Or you can encode any character using `\u` In addition you can take a look at the file generated by this code: ``` #test whitespace #Thu Aug 30 20:20:33 PDT 2018 spark.my.delimiter.nonDelimSpaceFromFile=\ blah\f spark.my.delimiter.infixDelimFromFile=\rblah\n spark.my.delimiter.trailingDelimKeyFromFile=blah\r spark.my.delimiter.leadingDelimKeyFromFile=\nblah ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22273 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95514/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22186 **[Test build #95518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95518/testReport)** for PR 22186 at commit [`fbced52`](https://github.com/apache/spark/commit/fbced52e5687cd5eb6a06c3b9bca5cbeb9343002). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22273 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22197: [SPARK-25207][SQL] Case-insensitve field resolution for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22197 **[Test build #95517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95517/testReport)** for PR 22197 at commit [`e0d6196`](https://github.com/apache/spark/commit/e0d61969b13bcfd9dfc95e2a013b14e111d2b832). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22273: [SPARK-25272][PYTHON][TEST] Add test to better indicate ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22273 **[Test build #95514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95514/testReport)** for PR 22273 at commit [`e8a2602`](https://github.com/apache/spark/commit/e8a2602476a52622a01c0cf4f72067f3119be96a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2723/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22297: [SPARK-25290][Core][Test] Reduce the size of acquired ar...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22297 cc @cloud-fan @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22186: [SPARK-25183][SQL] Spark HiveServer2 to use Spark Shutdo...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/22186 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org