spark git commit: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a TimeoutConf

2017-06-05 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bc537e40a -> 88a23d3de [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a TimeoutConf ## What changes were proposed in this pull request? The construction of BROADCAST_TIMEOUT conf should take the TimeUnit argument as a TimeoutConf.

spark git commit: [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException

2017-05-31 Thread zsxwing
8168 from zsxwing/SPARK-20940. (cherry picked from commit 24db35826a81960f08e3eb68556b0f51781144e1) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a607a26b Tree:

spark git commit: [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException

2017-05-31 Thread zsxwing
8168 from zsxwing/SPARK-20940. (cherry picked from commit 24db35826a81960f08e3eb68556b0f51781144e1) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cd870c0c Tree:

spark git commit: [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException

2017-05-31 Thread zsxwing
8168 from zsxwing/SPARK-20940. (cherry picked from commit 24db35826a81960f08e3eb68556b0f51781144e1) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dade85f7 Tree:

spark git commit: [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException

2017-05-31 Thread zsxwing
org/jira/browse/SPARK-20666) is an example of killing SparkContext due to `IllegalAccessError`). I think the correct type of exception in AccumulatorV2 should be `IllegalStateException`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18168 fro

spark git commit: [SPARK-20894][SS] Resolve the checkpoint location in driver and use the resolved path in state store

2017-05-31 Thread zsxwing
org/apache/spark/sql/execution/datasources/DataSource.scala#L402), it doesn't make things worse. ## How was this patch tested? The new added test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18149 from zsxwing/SPARK-20894. Project: http://git-wip-us.apache.org/repos/asf/s

[1/2] spark git commit: [SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation

2017-05-30 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 4bb6a53eb -> fa757ee1d http://git-wip-us.apache.org/repos/asf/spark/blob/fa757ee1/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala

[2/2] spark git commit: [SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation

2017-05-30 Thread zsxwing
[SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation ## What changes were proposed in this pull request? A bunch of changes to the StateStore APIs and implementation. Current state store API has a bunch of problems that causes too many transient

spark git commit: [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch.

2017-05-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 3b79e4cda -> f6730a70c [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch. ## What changes were proposed in this pull request? In summary, cost of recreating a KafkaProducer for writing

spark git commit: [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch.

2017-05-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1c7db00c7 -> 96a4d1d08 [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch. ## What changes were proposed in this pull request? In summary, cost of recreating a KafkaProducer for writing every

spark git commit: [SPARK-20907][TEST] Use testQuietly for test suites that generate long log output

2017-05-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 dc51be1e7 -> 26640a269 [SPARK-20907][TEST] Use testQuietly for test suites that generate long log output ## What changes were proposed in this pull request? Supress console output by using `testQuietly` in test suites ## How was

spark git commit: [SPARK-20907][TEST] Use testQuietly for test suites that generate long log output

2017-05-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master ef9fd920c -> c9749068e [SPARK-20907][TEST] Use testQuietly for test suites that generate long log output ## What changes were proposed in this pull request? Supress console output by using `testQuietly` in test suites ## How was this

spark git commit: [SPARK-20843][CORE] Add a config to set driver terminate timeout

2017-05-26 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18126 from zsxwing/SPARK-20843. (cherry picked from commit 6c1dbd6fc8d49acf7c1c902d2ebf89ed5e788a4e) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-20843][CORE] Add a config to set driver terminate timeout

2017-05-26 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18126 from zsxwing/SPARK-20843. (cherry picked from commit 6c1dbd6fc8d49acf7c1c902d2ebf89ed5e788a4e) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-20843][CORE] Add a config to set driver terminate timeout

2017-05-26 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18126 from zsxwing/SPARK-20843. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6c1dbd6f Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-05-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d935e0a9d -> 473d7552a [SPARK-20014] Optimize mergeSpillsWithFileStream method ## What changes were proposed in this pull request? When the individual partition size in a spill is small, mergeSpillsWithTransferTo method does many small

spark git commit: [SPARK-20844] Remove experimental from Structured Streaming APIs

2017-05-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 92837aeb4 -> 2b59ed4f1 [SPARK-20844] Remove experimental from Structured Streaming APIs Now that Structured Streaming has been out for several Spark release and has large production use cases, the `Experimental` label is no longer

spark git commit: [SPARK-20844] Remove experimental from Structured Streaming APIs

2017-05-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0fd84b05d -> d935e0a9d [SPARK-20844] Remove experimental from Structured Streaming APIs Now that Structured Streaming has been out for several Spark release and has large production use cases, the `Experimental` label is no longer

spark git commit: [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit

2017-05-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 f99456b5f -> 92837aeb4 [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit ## What changes were proposed in this pull request? When an expression for `df.filter()` has many nodes (e.g.

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
so that people can run `bin/run-example StructuredKafkaWordCount ...`. ## How was this patch tested? manually tested it. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18101 from zsxwing/add-missing-example-dep. (cherry picked from commit 98c3852986a2cb5f2d249d6c8ef602be283bd90e) S

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
so that people can run `bin/run-example StructuredKafkaWordCount ...`. ## How was this patch tested? manually tested it. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18101 from zsxwing/add-missing-example-dep. (cherry picked from commit 98c3852986a2cb5f2d249d6c8ef602be283bd90e) S

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
ple can run `bin/run-example StructuredKafkaWordCount ...`. ## How was this patch tested? manually tested it. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18101 from zsxwing/add-missing-example-dep. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries

2017-05-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bbd8d7def -> 9d6661c82 [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries ## What changes were proposed in this pull request? Currently, in the batch queries, timeout

spark git commit: [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries

2017-05-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 3aad5982a -> cfd1bf0be [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries ## What changes were proposed in this pull request? Currently, in the batch queries,

spark git commit: [SPARK-13747][CORE] Add ThreadUtils.awaitReady and disallow Await.ready

2017-05-17 Thread zsxwing
low `Await.ready`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17763 from zsxwing/awaitready. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/324a904d Tree: http://git-wip-us.a

spark git commit: [SPARK-13747][CORE] Add ThreadUtils.awaitReady and disallow Await.ready

2017-05-17 Thread zsxwing
low `Await.ready`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17763 from zsxwing/awaitready. (cherry picked from commit 324a904d8e80089d8865e4c7edaedb92ab2ec1b2) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wi

spark git commit: [SPARK-20788][CORE] Fix the Executor task reaper's false alarm warning logs

2017-05-17 Thread zsxwing
ask is finishing but being killed at the same time. The fix is pretty easy, just flip the "finished" flag when a task is successful. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18021 from zsxwing/SPARK-20788. (cherry

spark git commit: [SPARK-20788][CORE] Fix the Executor task reaper's false alarm warning logs

2017-05-17 Thread zsxwing
ask is finishing but being killed at the same time. The fix is pretty easy, just flip the "finished" flag when a task is successful. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18021 from zsxwing/SPARK-20788. Project: http://git-wip-

spark git commit: [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit

2017-05-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9150bca47 -> 6f62e9d9b [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit ## What changes were proposed in this pull request? When an expression for `df.filter()` has many nodes (e.g. 400),

spark git commit: [SPARK-20529][CORE] Allow worker and master work with a proxy server

2017-05-16 Thread zsxwing
How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17821 from zsxwing/SPARK-20529. (cherry picked from commit 9150bca47e4b8782e20441386d3d225eb5f2f404) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wi

spark git commit: [SPARK-20529][CORE] Allow worker and master work with a proxy server

2017-05-16 Thread zsxwing
How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17821 from zsxwing/SPARK-20529. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9150bca4 Tree: http://git-wip-us.apache.

spark git commit: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior

2017-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d2416925c -> 499ba2cb4 [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior ## What changes were proposed in this pull request? Timeout and state data are two independent entities and should be settable independently.

spark git commit: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior

2017-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 82ae1f0ac -> a79a120a8 [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior ## What changes were proposed in this pull request? Timeout and state data are two independent entities and should be settable independently.

spark git commit: [SPARK-20716][SS] StateStore.abort() should not throw exceptions

2017-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 0bd918f67 -> 82ae1f0ac [SPARK-20716][SS] StateStore.abort() should not throw exceptions ## What changes were proposed in this pull request? StateStore.abort() should do a best effort attempt to clean up temporary resources. It should

spark git commit: [SPARK-20716][SS] StateStore.abort() should not throw exceptions

2017-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e1aaab1e2 -> 271175e2b [SPARK-20716][SS] StateStore.abort() should not throw exceptions ## What changes were proposed in this pull request? StateStore.abort() should do a best effort attempt to clean up temporary resources. It should not

spark git commit: [SPARK-20714][SS] Fix match error when watermark is set with timeout = no timeout / processing timeout

2017-05-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 7123ec8e1 -> f14246959 [SPARK-20714][SS] Fix match error when watermark is set with timeout = no timeout / processing timeout ## What changes were proposed in this pull request? When watermark is set, and timeout conf is NoTimeout or

spark git commit: [SPARK-20714][SS] Fix match error when watermark is set with timeout = no timeout / processing timeout

2017-05-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7d6ff3910 -> 0d3a63193 [SPARK-20714][SS] Fix match error when watermark is set with timeout = no timeout / processing timeout ## What changes were proposed in this pull request? When watermark is set, and timeout conf is NoTimeout or

spark git commit: [SPARK-20702][CORE] TaskContextImpl.markTaskCompleted should not hide the original error

2017-05-12 Thread zsxwing
ted` to propagate the original error. It also fixes an issue that `TaskCompletionListenerException.getMessage` doesn't include `previousError`. ## How was this patch tested? New unit tests. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17942 from zsxwing/SPARK-20702. Project: h

spark git commit: [SPARK-20702][CORE] TaskContextImpl.markTaskCompleted should not hide the original error

2017-05-12 Thread zsxwing
ter to `TaskContextImpl.markTaskCompleted` to propagate the original error. It also fixes an issue that `TaskCompletionListenerException.getMessage` doesn't include `previousError`. ## How was this patch tested? New unit tests. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17942 from zsxwing/SPARK-20702. (cher

spark git commit: [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI

2017-05-11 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3aa4e464a -> 7144b5180 [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI ## What changes were proposed in this pull request? User-friendly name of `KafkaRelation` in web UI (under Details for Query). ### Before

spark git commit: [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI

2017-05-11 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 dd9e3b2c9 -> 5844151bc [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI ## What changes were proposed in this pull request? User-friendly name of `KafkaRelation` in web UI (under Details for Query). ### Before

spark git commit: [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute

2017-05-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 d191b962d -> 7600a7ab6 [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute ## What changes were proposed in this pull request? Any Dataset/DataFrame batch query with the operation

spark git commit: [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute

2017-05-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f79aa285c -> c0189abc7 [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute ## What changes were proposed in this pull request? Any Dataset/DataFrame batch query with the operation

spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load

2017-05-05 Thread zsxwing
PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17863 from zsxwing/fix-kafka-flaky-te

spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load

2017-05-05 Thread zsxwing
PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17863 from zsxwing/fix-kafka-flaky-te

spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load

2017-05-05 Thread zsxwing
ges `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17863 from zsxwing/fix-kafka-flaky-test. P

spark git commit: [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output

2017-05-03 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 527fc5d0c -> 6b9e49d12 [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output ## The Problem Right now DataFrame batch reader may fail to infer partitions when reading FileStreamSink's

spark git commit: [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output

2017-05-03 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 f0e80aa2d -> 36d807906 [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output ## The Problem Right now DataFrame batch reader may fail to infer partitions when reading

spark git commit: [SPARK-20464][SS] Add a job group and description for streaming queries and fix cancellation of running jobs using the job group

2017-05-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 38edb9256 -> 6f0d29672 [SPARK-20464][SS] Add a job group and description for streaming queries and fix cancellation of running jobs using the job group ## What changes were proposed in this pull request? Job group: adding a job group

spark git commit: [SPARK-20464][SS] Add a job group and description for streaming queries and fix cancellation of running jobs using the job group

2017-05-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master ab30590f4 -> 6fc6cf88d [SPARK-20464][SS] Add a job group and description for streaming queries and fix cancellation of running jobs using the job group ## What changes were proposed in this pull request? Job group: adding a job group is

spark git commit: [SPARK-19525][CORE] Add RDD checkpoint compression support

2017-04-28 Thread zsxwing
ess` to enable/disable it. Credit goes to aramesh117 Closes #17024 ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Author: Aaditya Ramesh <aram...@conviva.com> Closes #17789 from zsxwing/pr17024. Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-19525][CORE] Add RDD checkpoint compression support

2017-04-28 Thread zsxwing
ess` to enable/disable it. Credit goes to aramesh117 Closes #17024 ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Author: Aaditya Ramesh <aram...@conviva.com> Closes #17789 from zsxwing/pr17024. (cherry pic

spark git commit: [MINOR][SS] Fix a missing space in UnsupportedOperationChecker error message

2017-04-19 Thread zsxwing
ect. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17691 from zsxwing/fix-error-message. (cherry picked from commit 39e303a8b6db642c26dbc26ba92e87680f50e4da) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wi

spark git commit: [MINOR][SS] Fix a missing space in UnsupportedOperationChecker error message

2017-04-19 Thread zsxwing
ect. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17691 from zsxwing/fix-error-message. (cherry picked from commit 39e303a8b6db642c26dbc26ba92e87680f50e4da) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wi

spark git commit: [MINOR][SS] Fix a missing space in UnsupportedOperationChecker error message

2017-04-19 Thread zsxwing
ect. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17691 from zsxwing/fix-error-message. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/39e303a8 Tree: http://git-wip-us.a

spark git commit: [SPARK-20397][SPARKR][SS] Fix flaky test: test_streaming.R.Terminated by error

2017-04-19 Thread zsxwing
t's not guaranteed that source has been created. This PR just increases the timeout of awaitTermination to ensure the parsing error is thrown. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17687 from zsxwing/SPARK-20397. (cherry picked fr

spark git commit: [SPARK-20397][SPARKR][SS] Fix flaky test: test_streaming.R.Terminated by error

2017-04-19 Thread zsxwing
t's not guaranteed that source has been created. This PR just increases the timeout of awaitTermination to ensure the parsing error is thrown. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17687 from zsxwing/SPARK-20397. Project: http:

spark git commit: [SPARK-20131][CORE] Don't use `this` lock in StandaloneSchedulerBackend.stop

2017-04-12 Thread zsxwing
Executor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` This PR removes `synchronized` and changes `stopping` to AtomicBoolean to ensure idempotent to fix the dead-lock. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17610

spark git commit: [SPARK-20131][CORE] Don't use `this` lock in StandaloneSchedulerBackend.stop

2017-04-12 Thread zsxwing
cutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` This PR removes `synchronized` and changes `stopping` to AtomicBoolean to ensure idempotent to fix the dead-lock. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #1761

spark git commit: [SPARK-20282][SS][TESTS] Write the commit log first to fix a race contion in tests

2017-04-10 Thread zsxwing
ffsets` is updated. Then writing the commit log may be interrupted by the following `StopStream`. This PR just change the order to write the commit log first. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17594 from zsxwing/SPARK-20282. Project: ht

spark git commit: [SPARK-20285][TESTS] Increase the pyspark streaming test timeout to 30 seconds

2017-04-10 Thread zsxwing
eases the timeout to 30 seconds. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17597 from zsxwing/SPARK-20285. (cherry picked from commit f9a50ba2d1bfa3f55199df031e71154611ba51f6) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Pr

spark git commit: [SPARK-20285][TESTS] Increase the pyspark streaming test timeout to 30 seconds

2017-04-10 Thread zsxwing
eases the timeout to 30 seconds. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17597 from zsxwing/SPARK-20285. (cherry picked from commit f9a50ba2d1bfa3f55199df031e71154611ba51f6) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Pr

spark git commit: [SPARK-20285][TESTS] Increase the pyspark streaming test timeout to 30 seconds

2017-04-10 Thread zsxwing
eases the timeout to 30 seconds. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17597 from zsxwing/SPARK-20285. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f9a50ba2 Tree:

spark git commit: [SPARK-19721][SS][BRANCH-2.1] Good error message for version mismatch in log files

2017-03-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 4b977ff04 -> 710b5554e [SPARK-19721][SS][BRANCH-2.1] Good error message for version mismatch in log files ## Problem There are several places where we write out version identifiers in various logs for structured streaming (usually

spark git commit: [SPARK-19721][SS] Good error message for version mismatch in log files

2017-03-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 8e8f89833 -> 2ea214dd0 [SPARK-19721][SS] Good error message for version mismatch in log files ## Problem There are several places where we write out version identifiers in various logs for structured streaming (usually `v1`). However, in

spark git commit: [SPARK-19853][SS] uppercase kafka topics fail when startingOffsets are SpecificOffsets

2017-03-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 f9833c66a -> 8c4608046 [SPARK-19853][SS] uppercase kafka topics fail when startingOffsets are SpecificOffsets When using the KafkaSource with Structured Streaming, consumer assignments are not what the user expects if startingOffsets

spark git commit: [SPARK-19853][SS] uppercase kafka topics fail when startingOffsets are SpecificOffsets

2017-03-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9f8ce4825 -> 0a4d06a7c [SPARK-19853][SS] uppercase kafka topics fail when startingOffsets are SpecificOffsets When using the KafkaSource with Structured Streaming, consumer assignments are not what the user expects if startingOffsets is

spark git commit: [SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block

2017-03-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e29a74d5b -> 2f5187bde [SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block Cleaning the application may cost much time at worker, then it will block that

spark git commit: [SPARK-19891][SS] Await Batch Lock notified on stream execution exit

2017-03-09 Thread zsxwing
ion has been thrown. ## How was this patch tested? Current tests that throw exceptions at runtime will finish faster as a result of this update. zsxwing Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Clos

spark git commit: [SPARK-19891][SS] Await Batch Lock notified on stream execution exit

2017-03-09 Thread zsxwing
hen an exception has been thrown. ## How was this patch tested? Current tests that throw exceptions at runtime will finish faster as a result of this update. zsxwing Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Clos

spark git commit: [SPARK-19886] Fix reportDataLoss if statement in SS KafkaSource

2017-03-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 ffe65b065 -> a59cc369f [SPARK-19886] Fix reportDataLoss if statement in SS KafkaSource ## What changes were proposed in this pull request? Fix the `throw new IllegalStateException` if statement part. ## How is this patch tested

spark git commit: [SPARK-19886] Fix reportDataLoss if statement in SS KafkaSource

2017-03-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f79371ad8 -> 82138e09b [SPARK-19886] Fix reportDataLoss if statement in SS KafkaSource ## What changes were proposed in this pull request? Fix the `throw new IllegalStateException` if statement part. ## How is this patch tested

spark git commit: [SPARK-19861][SS] watermark should not be a negative time.

2017-03-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 2a76e2420 -> ffe65b065 [SPARK-19861][SS] watermark should not be a negative time. ## What changes were proposed in this pull request? `watermark` should not be negative. This behavior is invalid, check it before real run. ## How was

spark git commit: [SPARK-19861][SS] watermark should not be a negative time.

2017-03-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 40da4d181 -> 30b18e693 [SPARK-19861][SS] watermark should not be a negative time. ## What changes were proposed in this pull request? `watermark` should not be negative. This behavior is invalid, check it before real run. ## How was

spark git commit: [SPARK-19715][STRUCTURED STREAMING] Option to Strip Paths in FileSource

2017-03-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3232e54f2 -> 40da4d181 [SPARK-19715][STRUCTURED STREAMING] Option to Strip Paths in FileSource ## What changes were proposed in this pull request? Today, we compare the whole path when deciding if a file is new in the FileSource for

spark git commit: [SPARK-19859][SS][FOLLOW-UP] The new watermark should override the old one.

2017-03-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 029e40b41 -> eeb1d6db8 [SPARK-19859][SS][FOLLOW-UP] The new watermark should override the old one. ## What changes were proposed in this pull request? A follow up to SPARK-19859: - extract the calculation of `delayMs` and reuse it. -

spark git commit: [SPARK-19859][SS][FOLLOW-UP] The new watermark should override the old one.

2017-03-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 00859e148 -> 0c140c168 [SPARK-19859][SS][FOLLOW-UP] The new watermark should override the old one. ## What changes were proposed in this pull request? A follow up to SPARK-19859: - extract the calculation of `delayMs` and reuse it. -

spark git commit: [SPARK-19874][BUILD] Hide API docs for org.apache.spark.sql.internal

2017-03-08 Thread zsxwing
kage because they are internal private APIs. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17217 from zsxwing/SPARK-19874. (cherry picked from commit 029e40b412e332c9f0fff283d604e203066c78c0) Signed-off-by: Shixiong Zhu <shixi...@databrick

spark git commit: [SPARK-19874][BUILD] Hide API docs for org.apache.spark.sql.internal

2017-03-08 Thread zsxwing
kage because they are internal private APIs. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17217 from zsxwing/SPARK-19874. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/029e40

spark git commit: Revert "[SPARK-19413][SS] MapGroupsWithState for arbitrary stateful operations for branch-2.1"

2017-03-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 f6c1ad2eb -> 3457c3229 Revert "[SPARK-19413][SS] MapGroupsWithState for arbitrary stateful operations for branch-2.1" This reverts commit 502c927b8c8a99ef2adf4e6e1d7a6d9232d45ef5. Project:

spark git commit: [SPARK-19540][SQL] Add ability to clone SparkSession wherein cloned session has an identical copy of the SessionState

2017-03-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1bf901238 -> 6570cfd7a [SPARK-19540][SQL] Add ability to clone SparkSession wherein cloned session has an identical copy of the SessionState Forking a newSession() from SparkSession currently makes a new SparkSession that does not retain

spark git commit: [SPARK-19481] [REPL] [MAVEN] Avoid to leak SparkContext in Signaling.cancelOnInterrupt

2017-03-08 Thread zsxwing
all and it makes ReplSuite unstable. This PR adds `SparkContext.getActive` to allow `Signaling.cancelOnInterrupt` to get the active `SparkContext` to avoid the leak. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #16825 from zsxwing/SPARK-19481. Proj

spark git commit: [SPARK-19859][SS] The new watermark should override the old one

2017-03-07 Thread zsxwing
umn which has a watermark, it may be unexpected. ## How was this patch tested? The new test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17199 from zsxwing/SPARK-19859. (cherry picked from commit d8830c5039d9c7c5ef03631904c32873ab558e22) Signed-off-by: Shixiong Zhu <shixi...@da

spark git commit: [SPARK-19859][SS] The new watermark should override the old one

2017-03-07 Thread zsxwing
ich has a watermark, it may be unexpected. ## How was this patch tested? The new test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17199 from zsxwing/SPARK-19859. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-19841][SS] watermarkPredicate should filter based on keys

2017-03-07 Thread zsxwing
est. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17183 from zsxwing/SPARK-19841. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ca849ac4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ca849ac4 Diff: http:

spark git commit: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string.

2017-03-05 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 73801880f -> c7e7b042d [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string. ## What changes were proposed in this pull request?

spark git commit: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string.

2017-03-05 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 664c9795c -> ca7a7e8a8 [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string. ## What changes were proposed in this pull request?

spark git commit: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string.

2017-03-05 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 224e0e785 -> 207067ead [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not filter checkpointFilesOfLatestTime with the PATH string. ## What changes were proposed in this pull request?

spark git commit: [SPARK-19718][SS] Handle more interrupt cases properly for Hadoop

2017-03-03 Thread zsxwing
ks.com> Closes #17044 from zsxwing/SPARK-19718. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6a7a95e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6a7a95e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

spark git commit: [SPARK-19774] StreamExecution should call stop() on sources when a stream fails

2017-03-03 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 accbed7c2 -> da04d45c2 [SPARK-19774] StreamExecution should call stop() on sources when a stream fails ## What changes were proposed in this pull request? We call stop() on a Structured Streaming Source only when the stream is

spark git commit: [SPARK-19774] StreamExecution should call stop() on sources when a stream fails

2017-03-03 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 37a1c0e46 -> 9314c0837 [SPARK-19774] StreamExecution should call stop() on sources when a stream fails ## What changes were proposed in this pull request? We call stop() on a Structured Streaming Source only when the stream is shutdown

spark git commit: [SPARK-19779][SS] Delete needless tmp file after restart structured streaming job

2017-03-02 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 3a7591ad5 -> 1237aaea2 [SPARK-19779][SS] Delete needless tmp file after restart structured streaming job ## What changes were proposed in this pull request? [SPARK-19779](https://issues.apache.org/jira/browse/SPARK-19779) The PR

spark git commit: [SPARK-19779][SS] Delete needless tmp file after restart structured streaming job

2017-03-02 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 491b47a16 -> 73801880f [SPARK-19779][SS] Delete needless tmp file after restart structured streaming job ## What changes were proposed in this pull request? [SPARK-19779](https://issues.apache.org/jira/browse/SPARK-19779) The PR

spark git commit: [SPARK-19779][SS] Delete needless tmp file after restart structured streaming job

2017-03-02 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f37bb1430 -> e24f21b5f [SPARK-19779][SS] Delete needless tmp file after restart structured streaming job ## What changes were proposed in this pull request? [SPARK-19779](https://issues.apache.org/jira/browse/SPARK-19779) The PR

spark git commit: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 89cd3845b -> 4913c92c2 [SPARK-19633][SS] FileSource read from FileSink ## What changes were proposed in this pull request? Right now file source always uses `InMemoryFileIndex` to scan files from a given path. But when reading the

spark git commit: [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS

2017-02-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 a6af60f25 -> dcfb05c86 [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS ## What changes were proposed in this pull request? HDFSBackedStateStoreProvider fails to rename files on HDFS but not on

spark git commit: [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS

2017-02-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 4b4c3bf3f -> 947c0cd90 [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS ## What changes were proposed in this pull request? HDFSBackedStateStoreProvider fails to rename files on HDFS but not on

spark git commit: [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS

2017-02-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7c7fc30b4 -> 9734a928a [SPARK-19677][SS] Committing a delta file atop an existing one should not fail on HDFS ## What changes were proposed in this pull request? HDFSBackedStateStoreProvider fails to rename files on HDFS but not on the

spark git commit: [SPARK-19749][SS] Name socket source with a meaningful name

2017-02-27 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 16d8472f7 -> 735303835 [SPARK-19749][SS] Name socket source with a meaningful name ## What changes were proposed in this pull request? Name socket source with a meaningful name ## How was this patch tested? Jenkins Author: uncleGen

spark git commit: [SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle QueryTerminatedEvent if more then one listeners exists

2017-02-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 20a432951 -> 04fbb9e09 [SPARK-19594][STRUCTURED STREAMING] StreamingQueryListener fails to handle QueryTerminatedEvent if more then one listeners exists ## What changes were proposed in this pull request? currently if multiple

<    1   2   3   4   5   6   7   8   >