spark git commit: [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer

2018-03-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9945b0227 -> bd201bf61 [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer ## What changes were proposed in this pull request? CacheKafkaConsumer in the project `kafka-0-10-sql` is designed to maintain a

spark git commit: [SPARK-23533][SS] Add support for changing ContinuousDataReader's startOffset

2018-03-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 4f5bad615 -> 7c3e8995f [SPARK-23533][SS] Add support for changing ContinuousDataReader's startOffset ## What changes were proposed in this pull request? As discussion in #20675, we need add a new interface `ContinuousDataReaderFactory`

spark git commit: [SPARK-23481][WEBUI] lastStageAttempt should fail when a stage doesn't exist

2018-02-21 Thread zsxwing
ext available stage in the store when a stage doesn't exist. This PR adds `last(stageId)` to ensure it returns a correct `StageData` ## How was this patch tested? The new unit test. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20654 from zsxwing/SPARK-23481. (cherry picked fr

spark git commit: [SPARK-23481][WEBUI] lastStageAttempt should fail when a stage doesn't exist

2018-02-21 Thread zsxwing
ble stage in the store when a stage doesn't exist. This PR adds `last(stageId)` to ensure it returns a correct `StageData` ## How was this patch tested? The new unit test. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20654 from zsxwing/SPARK-23481. Project: http://git-wip-us.apache.

spark git commit: [SPARK-23434][SQL] Spark should not warn `metadata directory` for a HDFS file path

2018-02-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 83c008762 -> 3e48f3b9e [SPARK-23434][SQL] Spark should not warn `metadata directory` for a HDFS file path ## What changes were proposed in this pull request? In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), it

spark git commit: [SPARK-23400][SQL] Add a constructors for ScalaUDF

2018-02-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 320ffb130 -> 4f6a457d4 [SPARK-23400][SQL] Add a constructors for ScalaUDF ## What changes were proposed in this pull request? In this upcoming 2.3 release, we changed the interface of `ScalaUDF`. Unfortunately, some Spark packages

spark git commit: [SPARK-23400][SQL] Add a constructors for ScalaUDF

2018-02-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d58fe2883 -> 2ee76c22b [SPARK-23400][SQL] Add a constructors for ScalaUDF ## What changes were proposed in this pull request? In this upcoming 2.3 release, we changed the interface of `ScalaUDF`. Unfortunately, some Spark packages (e.g.,

spark git commit: [SPARK-23245][SS][TESTS] Don't access `lastExecution.executedPlan` in StreamTest

2018-01-26 Thread zsxwing
hor: Jose Torres <j...@databricks.com> Closes #20413 from zsxwing/SPARK-23245. (cherry picked from commit 6328868e524121bd00595959d6d059f74e038a6b) Signed-off-by: Shixiong Zhu <zsxw...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apach

spark git commit: [SPARK-23245][SS][TESTS] Don't access `lastExecution.executedPlan` in StreamTest

2018-01-26 Thread zsxwing
ose Torres <j...@databricks.com> Closes #20413 from zsxwing/SPARK-23245. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6328868e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6328868e Diff: http://git-wip-us.a

spark git commit: [SPARK-23242][SS][TESTS] Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread zsxwing
lso run. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20412 from zsxwing/SPARK-23242. (cherry picked from commit 073744985f439ca90afb9bd0bbc1332c53f7b4bb) Signed-off-by: Shixiong Zhu <zsxw...@gmail.com> Project: http://git-wip-us.apach

spark git commit: [SPARK-23242][SS][TESTS] Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread zsxwing
run. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20412 from zsxwing/SPARK-23242. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07374498 Tree: http://git-wip-us.apache.org/

spark git commit: [SPARK-23198][SS][TEST] Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution

2018-01-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 30272c668 -> 500c94434 [SPARK-23198][SS][TEST] Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution ## What changes were proposed in this pull request? Currently,

spark git commit: [SPARK-23198][SS][TEST] Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution

2018-01-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0e178e152 -> bc9641d90 [SPARK-23198][SS][TEST] Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution ## What changes were proposed in this pull request? Currently,

spark git commit: [SPARK-21996][SQL] read files with space in name for streaming

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1002bd6b2 -> 021947020 [SPARK-21996][SQL] read files with space in name for streaming ## What changes were proposed in this pull request? Structured streaming is now able to read files with space in file name (previously it would skip

spark git commit: [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 9783aea2c -> 050c1e24e [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins ## What changes were proposed in this pull request? Added documentation for stream-stream joins

spark git commit: [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bac0d661a -> 1002bd6b2 [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins ## What changes were proposed in this pull request? Added documentation for stream-stream joins

spark git commit: [SPARK-23119][SS] Minor fixes to V2 streaming APIs

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7823d43ec -> bac0d661a [SPARK-23119][SS] Minor fixes to V2 streaming APIs ## What changes were proposed in this pull request? - Added `InterfaceStability.Evolving` annotations - Improved docs. ## How was this patch tested? Existing

spark git commit: [SPARK-23119][SS] Minor fixes to V2 streaming APIs

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 b84c2a306 -> 9783aea2c [SPARK-23119][SS] Minor fixes to V2 streaming APIs ## What changes were proposed in this pull request? - Added `InterfaceStability.Evolving` annotations - Improved docs. ## How was this patch tested? Existing

spark git commit: [SPARK-23093][SS] Don't change run id when reconfiguring a continuous processing query.

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 dbd2a5566 -> 79ccd0cad [SPARK-23093][SS] Don't change run id when reconfiguring a continuous processing query. ## What changes were proposed in this pull request? Keep the run ID static, using a different ID for the epoch coordinator

spark git commit: [SPARK-23093][SS] Don't change run id when reconfiguring a continuous processing query.

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 86a845031 -> e946c63dd [SPARK-23093][SS] Don't change run id when reconfiguring a continuous processing query. ## What changes were proposed in this pull request? Keep the run ID static, using a different ID for the epoch coordinator to

spark git commit: Fix merge between 07ae39d0ec and 1667057851

2018-01-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 50345a2aa -> a963980a6 Fix merge between 07ae39d0ec and 1667057851 ## What changes were proposed in this pull request? The first commit added a new test, and the second refactored the class the test was in. The automatic merge put the

spark git commit: [SPARK-22956][SS] Bug fix for 2 streams union failover scenario

2018-01-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 e2ffb9781 -> e58c4a929 [SPARK-22956][SS] Bug fix for 2 streams union failover scenario ## What changes were proposed in this pull request? This problem reported by yanlin-Lynn ivoson and LiangchangZ. Thanks! When we union 2 streams

spark git commit: [SPARK-22956][SS] Bug fix for 2 streams union failover scenario

2018-01-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master c7572b79d -> 07ae39d0e [SPARK-22956][SS] Bug fix for 2 streams union failover scenario ## What changes were proposed in this pull request? This problem reported by yanlin-Lynn ivoson and LiangchangZ. Thanks! When we union 2 streams from

spark git commit: [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported

2018-01-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 20eea20c7 -> 105ae8680 [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported ## What changes were proposed in this pull request? `MetricsReporter ` assumes that there has been some progress

spark git commit: [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported

2018-01-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 db27a9365 -> 02176f4c2 [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported ## What changes were proposed in this pull request? `MetricsReporter ` assumes that there has been some progress

spark git commit: [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported

2018-01-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7bd14cfd4 -> 54277398a [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported ## What changes were proposed in this pull request? `MetricsReporter ` assumes that there has been some progress for

spark git commit: [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service

2018-01-04 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 6f68316e9 -> 93f92c0ed [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't

spark git commit: [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service

2018-01-04 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 bcfeef5a9 -> cd92913f3 [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't

spark git commit: [SPARK-21475][Core]Revert "[SPARK-21475][CORE] Use NIO's Files API to replace FileInputStream/FileOutputStream in some critical paths"

2017-12-29 Thread zsxwing
the default `InputStream.skip` which just consumes and discards data. This causes a huge performance regression when reading shuffle files. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20119 from zsxwing/revert-SPARK-21475. Project: http://git-wip-

[1/2] spark git commit: [SPARK-22789] Map-only continuous processing execution

2017-12-22 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d23dc5b8e -> 8941a4abc http://git-wip-us.apache.org/repos/asf/spark/blob/8941a4ab/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala --

[2/2] spark git commit: [SPARK-22789] Map-only continuous processing execution

2017-12-22 Thread zsxwing
[SPARK-22789] Map-only continuous processing execution ## What changes were proposed in this pull request? Basic continuous execution, supporting map/flatMap/filter, with commits and advancement through RPC. ## How was this patch tested? new unit-ish tests (exercising execution end to end)

spark git commit: [SPARK-22824] Restore old offset for binary compatibility

2017-12-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7570eab6b -> 7798c9e6e [SPARK-22824] Restore old offset for binary compatibility ## What changes were proposed in this pull request? Some users depend on source compatibility with the org.apache.spark.sql.execution.streaming.Offset

spark git commit: [SPARK-22781][SS] Support creating streaming dataset with ORC files

2017-12-19 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 13268a58f -> 9962390af [SPARK-22781][SS] Support creating streaming dataset with ORC files ## What changes were proposed in this pull request? Like `Parquet`, users can use `ORC` with Apache Spark structured streaming. This PR adds

spark git commit: [SPARK-22733] Split StreamExecution into MicroBatchExecution and StreamExecution.

2017-12-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 2fe16333d -> 59daf91b7 [SPARK-22733] Split StreamExecution into MicroBatchExecution and StreamExecution. ## What changes were proposed in this pull request? StreamExecution is now an abstract base class, which MicroBatchExecution (the

[2/2] spark git commit: [SPARK-22732] Add Structured Streaming APIs to DataSourceV2

2017-12-13 Thread zsxwing
[SPARK-22732] Add Structured Streaming APIs to DataSourceV2 ## What changes were proposed in this pull request? This PR provides DataSourceV2 API support for structured streaming, including new pieces needed to support continuous processing [SPARK-20928]. High level summary: - DataSourceV2

[1/2] spark git commit: [SPARK-22732] Add Structured Streaming APIs to DataSourceV2

2017-12-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1e44dd004 -> f8c7c1f21 http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala -- diff

spark git commit: [SPARK-22187][SS][REVERT] Revert change in state row format for mapGroupsWithState

2017-12-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0ba8f4b21 -> b11869bc3 [SPARK-22187][SS][REVERT] Revert change in state row format for mapGroupsWithState ## What changes were proposed in this pull request? #19416 changed the format in which rows were encoded in the state store.

spark git commit: [SPARK-22638][SS] Use a separate queue for StreamingQueryListenerBus

2017-12-01 Thread zsxwing
non-streaming events, streaming query listeners don't need to wait for other Spark listeners and can catch up. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19838 from zsxwing/SPARK-22638. Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-22544][SS] FileStreamSource should use its own hadoop conf to call globPathIfNecessary

2017-11-17 Thread zsxwing
onf into `globPathIfNecessary` so that it can pick up user's hadoop configurations, such as credentials. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19771 from zsxwing/fix-file-stream-conf. (cherry picked from commit bf0c0ae2dcc7fd1ce92cd0fb4809bb3

spark git commit: [SPARK-22544][SS] FileStreamSource should use its own hadoop conf to call globPathIfNecessary

2017-11-17 Thread zsxwing
onf into `globPathIfNecessary` so that it can pick up user's hadoop configurations, such as credentials. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19771 from zsxwing/fix-file-stream-conf. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-22535][PYSPARK] Sleep before killing the python worker in PythRunner.MonitorThread (branch-2.2)

2017-11-16 Thread zsxwing
ted? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19768 from zsxwing/SPARK-22535-2.2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/be68f86e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/be68f86e D

spark git commit: [SPARK-21667][STREAMING] ConsoleSink should not fail streaming query with checkpointLocation option

2017-11-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f2da738c7 -> 808e886b9 [SPARK-21667][STREAMING] ConsoleSink should not fail streaming query with checkpointLocation option ## What changes were proposed in this pull request? Fix to allow recovery on console , avoid checkpoint exception

spark git commit: [SPARK-19644][SQL] Clean up Scala reflection garbage after creating Encoder (branch-2.2)

2017-11-10 Thread zsxwing
nce is `cleanUpReflectionObjects` is protected by `ScalaReflectionLock.synchronized` in this PR for Scala 2.10. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19718 from zsxwing/SPARK-19644-2.2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-19644][SQL] Clean up Scala reflection garbage after creating Encoder

2017-11-10 Thread zsxwing
;zsxw...@gmail.com> Closes #19687 from zsxwing/SPARK-19644. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24ea781c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24ea781c Diff: http://git-wip-us.apache.org

spark git commit: [SPARK-22294][DEPLOY] Reset spark.driver.bindAddress when starting a Checkpoint

2017-11-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 eb49c3244 -> 371be22b1 [SPARK-22294][DEPLOY] Reset spark.driver.bindAddress when starting a Checkpoint ## What changes were proposed in this pull request? It seems that recovering from a checkpoint can replace the old driver and

spark git commit: [SPARK-22294][DEPLOY] Reset spark.driver.bindAddress when starting a Checkpoint

2017-11-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master b70aa9e08 -> 5ebdcd185 [SPARK-22294][DEPLOY] Reset spark.driver.bindAddress when starting a Checkpoint ## What changes were proposed in this pull request? It seems that recovering from a checkpoint can replace the old driver and executor

spark git commit: [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery

2017-11-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 0568f289d -> eb49c3244 [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery ## What changes were proposed in this pull request? the previous [PR](https://github.com/apache/spark/pull/19469) is

spark git commit: [SPARK-22403][SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example

2017-11-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 0e97c8eef -> ede0e1a98 [SPARK-22403][SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example ## What changes were proposed in this pull request? When run in YARN cluster mode, the StructuredKafkaWordCount

spark git commit: [SPARK-22403][SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example

2017-11-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9eb7096c4 -> 11c402104 [SPARK-22403][SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example ## What changes were proposed in this pull request? When run in YARN cluster mode, the StructuredKafkaWordCount example

spark git commit: [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery

2017-11-02 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e3f67a97f -> 882079f5c [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery ## What changes were proposed in this pull request? the previous [PR](https://github.com/apache/spark/pull/19469) is deleted

spark git commit: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap non-recursively

2017-10-31 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7986cc09b -> 73231860b [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap non-recursively ## What changes were proposed in this pull request? Write HDFSBackedStateStoreProvider.loadMap non-recursively. This prevents stack overflow

spark git commit: [SPARK-22366] Support ignoring missing files

2017-10-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 5415963d2 -> 8e9863531 [SPARK-22366] Support ignoring missing files ## What changes were proposed in this pull request? Add a flag "spark.sql.files.ignoreMissingFiles" to parallel the existing flag "spark.sql.files.ignoreCorruptFiles".

spark git commit: [MINOR][SS] keyWithIndexToNumValues" -> "keyWithIndexToValue"

2017-10-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3823dc88d -> 1bb8b7604 [MINOR][SS] keyWithIndexToNumValues" -> "keyWithIndexToValue" ## What changes were proposed in this pull request? This PR changes `keyWithIndexToNumValues` to `keyWithIndexToValue`. There will be directories on

spark git commit: [SPARK-21988][SS] Implement StreamingRelation.computeStats to fix explain

2017-10-11 Thread zsxwing
ted? - unit tests: `StreamingRelation.computeStats` and `StreamingExecutionRelation.computeStats`. - regression tests: `explain join with a normal source` and `explain join with MemoryStream`. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19465 from zsxwing/SPARK-21988. Project: http:

spark git commit: [SPARK-22230] Swap per-row order in state store restore.

2017-10-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 155ab6347 -> 71c2b81aa [SPARK-22230] Swap per-row order in state store restore. ## What changes were proposed in this pull request? In state store restore, for each row, put the saved state before the row in the iterator instead of after.

spark git commit: [SPARK-21947][SS] Check and report error when monotonically_increasing_id is used in streaming query

2017-10-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 08b204fd2 -> debcbec74 [SPARK-21947][SS] Check and report error when monotonically_increasing_id is used in streaming query ## What changes were proposed in this pull request? `monotonically_increasing_id` doesn't work in Structured

spark git commit: [SPARK-22203][SQL] Add job description for file listing Spark jobs

2017-10-04 Thread zsxwing
7-9c2b-7bf80b153adb.png;> Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19432 from zsxwing/SPARK-22203. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c8affec2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c8af

spark git commit: [SPARK-22094][SS] processAllAvailable should check the query state

2017-09-21 Thread zsxwing
Zhu <zsxw...@gmail.com> Closes #19314 from zsxwing/SPARK-22094. (cherry picked from commit fedf6961be4e99139eb7ab08d5e6e29187ea5ccf) Signed-off-by: Shixiong Zhu <zsxw...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/a

spark git commit: [SPARK-22094][SS] processAllAvailable should check the query state

2017-09-21 Thread zsxwing
uld return. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19314 from zsxwing/SPARK-22094. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fedf6961 Tree: http:

spark git commit: [SPARK-21113][CORE] Read ahead input stream to amortize disk IO cost …

2017-09-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7c7266208 -> 1e978b17d [SPARK-21113][CORE] Read ahead input stream to amortize disk IO cost … Profiling some of our big jobs, we see that around 30% of the time is being spent in reading the spill files from disk. In order to amortize

spark git commit: [SPARK-21988] Add default stats to StreamingExecutionRelation.

2017-09-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master ddd7f5e11 -> 054ddb2f5 [SPARK-21988] Add default stats to StreamingExecutionRelation. ## What changes were proposed in this pull request? Add default stats to StreamingExecutionRelation. ## How was this patch tested? existing unit tests

spark git commit: [SPARK-21901][SS] Define toString for StateOperatorProgress

2017-09-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 9afab9a52 -> 342cc2a4c [SPARK-21901][SS] Define toString for StateOperatorProgress ## What changes were proposed in this pull request? Just `StateOperatorProgress.toString` + few formatting fixes ## How was this patch tested? Local

spark git commit: [SPARK-21901][SS] Define toString for StateOperatorProgress

2017-09-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master acdf45fb5 -> fa0092bdd [SPARK-21901][SS] Define toString for StateOperatorProgress ## What changes were proposed in this pull request? Just `StateOperatorProgress.toString` + few formatting fixes ## How was this patch tested? Local

spark git commit: [SPARK-9104][CORE] Expose Netty memory metrics in Spark

2017-09-05 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 6a2325448 -> 445f1790a [SPARK-9104][CORE] Expose Netty memory metrics in Spark ## What changes were proposed in this pull request? This PR exposes Netty memory usage for Spark's `TransportClientFactory` and `TransportServer`, including

spark git commit: [SPARK-21880][WEB UI] In the SQL table page, modify jobs trace information

2017-09-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0bdbefe9d -> 12f0d2422 [SPARK-21880][WEB UI] In the SQL table page, modify jobs trace information ## What changes were proposed in this pull request? As shown below, for example, When the job 5 is running, It was a mistake to think that

spark git commit: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF` and ` SO_SNDBUF` in SparkConf.

2017-08-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d3abb3699 -> 763b83ee8 [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF` and ` SO_SNDBUF` in SparkConf. ## What changes were proposed in this pull request? TCP parameters like SO_RCVBUF and SO_SNDBUF can be set in SparkConf, and

spark git commit: [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value

2017-08-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 7446be332 -> f6d56d2f1 [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value Same PR as #18799 but for branch 2.2. Main discussion the other PR. When I was investigating a flaky test, I realized

spark git commit: [SPARK-21565][SS] Propagate metadata in attribute replacement.

2017-08-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 43f9c84b6 -> fa92a7be7 [SPARK-21565][SS] Propagate metadata in attribute replacement. ## What changes were proposed in this pull request? Propagate metadata in attribute replacement during streaming execution. This is necessary for

spark git commit: [SPARK-21565][SS] Propagate metadata in attribute replacement.

2017-08-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 4f7ec3a31 -> cce25b360 [SPARK-21565][SS] Propagate metadata in attribute replacement. ## What changes were proposed in this pull request? Propagate metadata in attribute replacement during streaming execution. This is necessary for

spark git commit: [SPARK-21374][CORE] Fix reading globbed paths from S3 into DF with disabled FS cache

2017-08-07 Thread zsxwing
t; Author: Andrey Taptunov <taptu...@amazon.com> Closes #18848 from zsxwing/review-pr18623. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43f9c84b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43f9c84b Diff:

spark git commit: [SPARK-21546][SS] dropDuplicates should ignore watermark when it's not a key

2017-08-02 Thread zsxwing
ash. This PR fixed this issue. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18822 from zsxwing/SPARK-21546. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0d26

spark git commit: [SPARK-21546][SS] dropDuplicates should ignore watermark when it's not a key

2017-08-02 Thread zsxwing
ash. This PR fixed this issue. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18822 from zsxwing/SPARK-21546. (cherry picked from commit 0d26b3aa55f9cc75096b0e2b309f64fe3270b9a5) Signed-off-by: Shixiong Zhu <shixi...@databricks.co

spark git commit: [SPARK-21597][SS] Fix a potential overflow issue in EventTimeStats

2017-08-02 Thread zsxwing
ted? The new unit tests Author: Shixiong Zhu <shixi...@databricks.com> Closes #18803 from zsxwing/avg. (cherry picked from commit 7f63e85b47a93434030482160e88fe63bf9cff4e) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-21597][SS] Fix a potential overflow issue in EventTimeStats

2017-08-02 Thread zsxwing
ted? The new unit tests Author: Shixiong Zhu <shixi...@databricks.com> Closes #18803 from zsxwing/avg. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7f63e85b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7f63

spark git commit: [CORE][MINOR] Improve the error message of checkpoint RDD verification

2017-08-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 77cc0d67d -> 4cc704b12 [CORE][MINOR] Improve the error message of checkpoint RDD verification ### What changes were proposed in this pull request? The original error message is pretty confusing. It is unable to tell which number is

spark git commit: [SPARK-21517][CORE] Avoid copying memory when transfer chunks remotely

2017-07-25 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 300807c6e -> 16612638f [SPARK-21517][CORE] Avoid copying memory when transfer chunks remotely ## What changes were proposed in this pull request? In our production cluster,oom happens when NettyBlockRpcServer receive OpenBlocks

spark git commit: [SPARK-21409][SS] Expose state store memory usage in SQL metrics and progress updates

2017-07-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 53465075c -> 9d8c83179 [SPARK-21409][SS] Expose state store memory usage in SQL metrics and progress updates ## What changes were proposed in this pull request? Currently, there is no tracking of memory usage of state stores. This JIRA

spark git commit: [SPARK-21421][SS] Add the query id as a local property to allow source and sink using it

2017-07-14 Thread zsxwing
ing it. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18638 from zsxwing/SPARK-21421. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2d968a07 Tree: http:

spark git commit: [SPARK-21146][CORE] Master/Worker should handle and shutdown when any thread gets UncaughtException

2017-07-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 24367f23f -> e16e8c7ad [SPARK-21146][CORE] Master/Worker should handle and shutdown when any thread gets UncaughtException ## What changes were proposed in this pull request? Adding the default UncaughtExceptionHandler to the Worker. ##

spark git commit: [SPARK-21069][SS][DOCS] Add rate source to programming guide.

2017-07-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9760c15ac -> d0bfc6733 [SPARK-21069][SS][DOCS] Add rate source to programming guide. ## What changes were proposed in this pull request? SPARK-20979 added a new structured streaming source: Rate source. This patch adds the corresponding

spark git commit: [SPARK-21069][SS][DOCS] Add rate source to programming guide.

2017-07-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 576fd4c3a -> ab12848d6 [SPARK-21069][SS][DOCS] Add rate source to programming guide. ## What changes were proposed in this pull request? SPARK-20979 added a new structured streaming source: Rate source. This patch adds the

spark git commit: [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 40c7add3a -> e5bb26174 [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode ## What changes were proposed in this pull request? Making EventTimeWatermarkExec explicitly UnaryExecNode /cc tdas zsxwing ##

spark git commit: [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 4e53a4edd -> 576fd4c3a [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation ## What changes were proposed in this pull request? Few changes to the Structured Streaming documentation - Clarify that the entire stream input

spark git commit: [SPARK-21248][SS] The clean up codes in StreamExecution should not be interrupted

2017-07-05 Thread zsxwing
des in StreamExecution is interrupted. It also removes an optimization in `runUninterruptibly` to make sure this method never throw `InterruptedException`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18461 from zsxwing/SPARK-21248. Project: http:

spark git commit: [SPARK-21253][CORE][HOTFIX] Fix Scala 2.10 build

2017-06-29 Thread zsxwing
ong Zhu <shixi...@databricks.com> Closes #18478 from zsxwing/SPARK-21253-2. (cherry picked from commit cfc696f4a4289acf132cb26baf7c02c5b6305277) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.a

spark git commit: [SPARK-21253][CORE][HOTFIX] Fix Scala 2.10 build

2017-06-29 Thread zsxwing
Zhu <shixi...@databricks.com> Closes #18478 from zsxwing/SPARK-21253-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cfc696f4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cfc696f4 Diff: http://git-wip-us.a

spark git commit: [SPARK-21188][CORE] releaseAllLocksForTask should synchronize the whole method

2017-06-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 18066f2e6 -> f9151bebc [SPARK-21188][CORE] releaseAllLocksForTask should synchronize the whole method ## What changes were proposed in this pull request? Since the objects `readLocksByTask`, `writeLocksByTask` and `info`s are coupled and

spark git commit: [SPARK-21216][SS] Hive strategies missed in Structured Streaming IncrementalExecution

2017-06-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 838effb98 -> e68aed70f [SPARK-21216][SS] Hive strategies missed in Structured Streaming IncrementalExecution ## What changes were proposed in this pull request? If someone creates a HiveSession, the planner in `IncrementalExecution`

spark git commit: [SPARK-21153] Use project instead of expand in tumbling windows

2017-06-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 6b3d02285 -> 5282bae04 [SPARK-21153] Use project instead of expand in tumbling windows ## What changes were proposed in this pull request? Time windowing in Spark currently performs an Expand + Filter, because there is no way to

spark git commit: [SPARK-21192][SS] Preserve State Store provider class configuration across StreamingQuery restarts

2017-06-23 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1ebe7ffe0 -> 2ebd0838d [SPARK-21192][SS] Preserve State Store provider class configuration across StreamingQuery restarts ## What changes were proposed in this pull request? If the SQL conf for StateStore provider class is changed

spark git commit: [SPARK-20599][SS] ConsoleSink should work with (batch)

2017-06-22 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 19331b8e4 -> e55a105ae [SPARK-20599][SS] ConsoleSink should work with (batch) ## What changes were proposed in this pull request? Currently, if we read a batch and want to display it on the console sink, it will lead a runtime exception.

spark git commit: [SPARK-21167][SS] Decode the path generated by File sink to handle special characters

2017-06-22 Thread zsxwing
ers. ## How was this patch tested? The added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18381 from zsxwing/SPARK-21167. (cherry picked from commit d66b143eec7f604595089f72d8786edbdcd74282) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project:

spark git commit: [SPARK-21167][SS] Decode the path generated by File sink to handle special characters

2017-06-22 Thread zsxwing
ers. ## How was this patch tested? The added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18381 from zsxwing/SPARK-21167. (cherry picked from commit d66b143eec7f604595089f72d8786edbdcd74282) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project:

spark git commit: [SPARK-21167][SS] Decode the path generated by File sink to handle special characters

2017-06-22 Thread zsxwing
How was this patch tested? The added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18381 from zsxwing/SPARK-21167. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d66b143e Tree: http://git-wip-us.a

spark git commit: [SPARK-21147][SS] Throws an analysis exception when a user-specified schema is given in socket/rate sources

2017-06-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master ad459cfb1 -> 7a00c658d [SPARK-21147][SS] Throws an analysis exception when a user-specified schema is given in socket/rate sources ## What changes were proposed in this pull request? This PR proposes to throw an exception if a schema is

spark git commit: [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table - version to fix 2.1

2017-06-20 Thread zsxwing
ted the structured streaming programming guide. zsxwing This is the PR to fix version 2.1 as discussed in PR #18342 Author: assafmendelson <assaf.mendel...@gmail.com> Closes #18363 from assafmendelson/spark-21123-for-spark2.1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table

2017-06-19 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 f7fcdec6c -> 7b50736c4 [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table ## What changes were proposed in this pull request? The description for several options of File Source for

spark git commit: [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table

2017-06-19 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e92ffe6f1 -> 66a792cd8 [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table ## What changes were proposed in this pull request? The description for several options of File Source for structured

spark git commit: [SPARK-20979][SS] Add RateSource to generate values for tests and benchmark

2017-06-13 Thread zsxwing
e added tests. Author: Shixiong Zhu <shixi...@databricks.com> Author: Michael Armbrust <mich...@databricks.com> Closes #18199 from zsxwing/rate. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/220943d8 Tree: http://git-wip

spark git commit: [SPARK-20979][SS] Add RateSource to generate values for tests and benchmark

2017-06-12 Thread zsxwing
e added tests. Author: Shixiong Zhu <shixi...@databricks.com> Author: Michael Armbrust <mich...@databricks.com> Closes #18199 from zsxwing/rate. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/74a432d3 Tree: http://git-wip

<    1   2   3   4   5   6   7   8   >