spark git commit: [SPARK-21788][SS] Handle more exceptions when stopping a streaming query

2017-08-24 Thread tdas
Repository: spark Updated Branches: refs/heads/master 2dd37d827 -> d3abb3699 [SPARK-21788][SS] Handle more exceptions when stopping a streaming query ## What changes were proposed in this pull request? Add more cases we should view as a normal query stop rather than a failure. ## How was

spark git commit: [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files

2017-08-14 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 7b9807754 -> 48bacd36c [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files ## What changes were proposed in this pull request? Directly writing a snapshot file may generate a partial file. This PR changes

spark git commit: [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files

2017-08-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master fbc269252 -> 282f00b41 [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files ## What changes were proposed in this pull request? Directly writing a snapshot file may generate a partial file. This PR changes it

spark git commit: [SPARK-21587][SS] Added filter pushdown through watermarks.

2017-08-09 Thread tdas
Repository: spark Updated Branches: refs/heads/master 2d799d080 -> 0fb73253f [SPARK-21587][SS] Added filter pushdown through watermarks. ## What changes were proposed in this pull request? Push filter predicates through EventTimeWatermark if they're deterministic and do not reference the

spark git commit: [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value

2017-08-08 Thread tdas
Repository: spark Updated Branches: refs/heads/master fb54a564d -> 6edfff055 [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value ## What changes were proposed in this pull request? When I was investigating a flaky test, I realized that many places don't check

spark git commit: [SPARK-21464][SS] Minimize deprecation warnings caused by ProcessingTime class

2017-07-19 Thread tdas
ing its uses from tests as much as possible. ## How was this patch tested? Existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18678 from tdas/SPARK-21464. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-21464][SS] Minimize deprecation warnings caused by ProcessingTime class

2017-07-19 Thread tdas
ing by removing its uses from tests as much as possible. ## How was this patch tested? Existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18678 from tdas/SPARK-21464. (cherry picked from commit 70fe99dc62ef636a99bcb8a580ad4de4dca95181) Signed-off-by: Tathagata Das <ta

spark git commit: [SPARK-21462][SS] Added batchId to StreamingQueryProgress.json

2017-07-18 Thread tdas
lso, removed recently added numPartitions from StatefulOperatorProgress as this value does not change through the query run, and there are other ways to find that. ## How was this patch tested? Updated unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18675 from t

spark git commit: [SPARK-21409][SS] Follow up PR to allow different types of custom metrics to be exposed

2017-07-17 Thread tdas
PR enables that. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18661 from tdas/SPARK-21409-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9faae13 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

spark git commit: [SPARK-21370][SS] Add test for state reliability when one read-only state store aborts after read-write state store commits

2017-07-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master e16e8c7ad -> e0af76a36 [SPARK-21370][SS] Add test for state reliability when one read-only state store aborts after read-write state store commits ## What changes were proposed in this pull request? During Streaming Aggregation, we have

spark git commit: [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted

2017-07-06 Thread tdas
row interrupt exception, in which case temporary checkpoint directories will not be deleted, and the test will fail. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18442 from tdas/DatastreamReaderWriterSuite-fix. (cherry picked from commit 60043f22458668ac7ecba94fa78953f23a6bdcec) S

spark git commit: [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted

2017-07-06 Thread tdas
upt exception, in which case temporary checkpoint directories will not be deleted, and the test will fail. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18442 from tdas/DatastreamReaderWriterSuite-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-21145][SS] Added StateStoreProviderId with queryRunId to reload StateStoreProviders when query is restarted

2017-06-23 Thread tdas
il.com> Closes #18355 from tdas/SPARK-21145. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe24634d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe24634d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

spark git commit: [SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManagerSuite listing

2017-06-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 1388fdd70 -> 421d8ecb8 [SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManagerSuite listing ## What changes were proposed in this pull request? When stopping StreamingQuery, StreamExecution will set `streamDeathCause`

spark git commit: [SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManagerSuite listing

2017-06-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 06c054411 -> bc537e40a [SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManagerSuite listing ## What changes were proposed in this pull request? When stopping StreamingQuery, StreamExecution will set `streamDeathCause` then

spark git commit: [SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationException for batch Kafka DataFrame

2017-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 753e129f3 -> 3d53d825e [SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationException for batch Kafka DataFrame ## What changes were proposed in this pull request? Cancel a batch Kafka query but one of task cannot be

spark git commit: [SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationException for batch Kafka DataFrame

2017-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 01c999e7f -> 823baca2c [SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationException for batch Kafka DataFrame ## What changes were proposed in this pull request? Cancel a batch Kafka query but one of task cannot be cancelled,

spark git commit: [SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fix the potential hang in CachedKafkaConsumer

2017-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 4512e2ae6 -> 753e129f3 [SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fix the potential hang in CachedKafkaConsumer ## What changes were proposed in this pull request? This PR changes Executor's threads to

spark git commit: [SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fix the potential hang in CachedKafkaConsumer

2017-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 606432a13 -> 01c999e7f [SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fix the potential hang in CachedKafkaConsumer ## What changes were proposed in this pull request? This PR changes Executor's threads to

spark git commit: [SPARK-20377][SS] Fix JavaStructuredSessionization example

2017-04-18 Thread tdas
ate when using timeouts. ## How was this patch tested? manually ran the example Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17676 from tdas/SPARK-20377. (cherry picked from commit 74aa0df8f7f132b62754e5159262e4a5b9b641ab) Signed-off-by: Tathagata Das <tathagata.das1.

spark git commit: [SPARK-20377][SS] Fix JavaStructuredSessionization example

2017-04-18 Thread tdas
hen using timeouts. ## How was this patch tested? manually ran the example Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17676 from tdas/SPARK-20377. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/74aa

spark git commit: [SPARK-20301][FLAKY-TEST] Fix Hadoop Shell.runCommand flakiness in Structured Streaming tests

2017-04-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 99a947312 -> 924c42477 [SPARK-20301][FLAKY-TEST] Fix Hadoop Shell.runCommand flakiness in Structured Streaming tests ## What changes were proposed in this pull request? Some Structured Streaming tests show flakiness such as: ``` [info] -

spark git commit: [SPARK-20224][SS] Updated docs for streaming dropDuplicates and mapGroupsWithState

2017-04-05 Thread tdas
ted markdown docs - Updated scala docs - Added scala and Java example ## How was this patch tested? Manually ran examples. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17539 from tdas/SPARK-20224. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-20209][SS] Execute next trigger immediately if previous batch took longer than trigger interval

2017-04-05 Thread tdas
ple, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock. ## How was this patch tested? Added new unit tests to comprehensively test this behavior. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17525 from tdas/SPARK-20209. Project: http:

spark git commit: [SPARK-20165][SS] Resolve state encoder's deserializer in driver in FlatMapGroupsWithStateExec

2017-03-31 Thread tdas
il.com> Closes #17488 from tdas/SPARK-20165. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/567a50ac Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/567a50ac Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

spark git commit: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-23 Thread tdas
ean that we should fall back to what's in the offset log. - A OneTime trigger execution that results in an exception being thrown. marmbrus tdas zsxwing Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Author:

[2/2] spark git commit: [SPARK-20057][SS] Renamed KeyedState to GroupState in mapGroupsWithState

2017-03-22 Thread tdas
is would make it more general if you extends this operation to RelationGroupedDataset and python APIs. ## How was this patch tested? Existing unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17385 from tdas/SPARK-20057. Project: http://git-wip-us.apache.org/repos/a

[1/2] spark git commit: [SPARK-20057][SS] Renamed KeyedState to GroupState in mapGroupsWithState

2017-03-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master 80fd07038 -> 82b598b96 http://git-wip-us.apache.org/repos/asf/spark/blob/82b598b9/sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala

spark git commit: [SPARK-20030][SS] Event-time-based timeout for MapGroupsWithState

2017-03-21 Thread tdas
ing `KeyedState.setTimeoutTimestamp`. The keys times out when the watermark crosses the timeout timestamp. ## How was this patch tested? Unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17361 from tdas/SPARK-20030. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-20051][SS] Fix StreamSuite flaky test - recover from v2.1 checkpoint

2017-03-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9281a3d50 -> 2d73fcced [SPARK-20051][SS] Fix StreamSuite flaky test - recover from v2.1 checkpoint ## What changes were proposed in this pull request? There is a race condition between calling stop on a streaming query and deleting

spark git commit: [SPARK-19906][SS][DOCS] Documentation describing how to write queries to Kafka

2017-03-20 Thread tdas
fka. zsxwing tdas Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Closes #17246 from tcondie/kafka-write-docs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/

[2/2] spark git commit: [SPARK-19067][SS] Processing-time-based timeout in MapGroupsWithState

2017-03-19 Thread tdas
to address. ## How was this patch tested? New unit tests in - MapGroupsWithStateSuite for timeouts. - StateStoreSuite for new APIs in StateStore. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17179 from tdas/mapgroupwithstate-timeout. Project: http://git-wip-us.apache.org

[1/2] spark git commit: [SPARK-19067][SS] Processing-time-based timeout in MapGroupsWithState

2017-03-19 Thread tdas
Repository: spark Updated Branches: refs/heads/master 0ee9fbf51 -> 990af630d http://git-wip-us.apache.org/repos/asf/spark/blob/990af630/sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala -- diff --git

spark git commit: [SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch.

2017-03-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7de66bae5 -> 3783539d7 [SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch. ## What changes were proposed in this pull request? If the user changes the shuffle partition number between batches,

spark git commit: [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable

2017-03-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 fd5149aff -> 6ee7d5bf4 [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable ## What changes were proposed in this pull request? Sometimes, CheckpointTests will hang on a busy machine because the streaming

spark git commit: [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable

2017-03-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7b5d873ae -> 376d78216 [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable ## What changes were proposed in this pull request? Sometimes, CheckpointTests will hang on a busy machine because the streaming jobs

spark git commit: [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable

2017-03-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 710b5554e -> 5fb70831b [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable ## What changes were proposed in this pull request? Sometimes, CheckpointTests will hang on a busy machine because the streaming

spark git commit: [SPARK-19858][SS] Add output mode to flatMapGroupsWithState and disallow invalid cases

2017-03-08 Thread tdas
Repository: spark Updated Branches: refs/heads/master e9e2c612d -> 1bf901238 [SPARK-19858][SS] Add output mode to flatMapGroupsWithState and disallow invalid cases ## What changes were proposed in this pull request? Add a output mode parameter to `flatMapGroupsWithState` and just define

spark git commit: [SPARK-19719][SS] Kafka writer for both structured streaming and batch queires

2017-03-06 Thread tdas
end and ErrorIfExist supported under identical semantics. Other save modes result in an AnalysisException tdas zsxwing ## How was this patch tested? ### The following unit tests will be included - write to stream with topic field: valid stream write with data that includes an existing topic in the sch

spark git commit: [SPARK-19719][SS] Kafka writer for both structured streaming and batch queires

2017-03-06 Thread tdas
end and ErrorIfExist supported under identical semantics. Other save modes result in an AnalysisException tdas zsxwing ## How was this patch tested? ### The following unit tests will be included - write to stream with topic field: valid stream write with data that includes an existing topic in the schema - wr

spark git commit: [SPARK-19497][SS] Implement streaming deduplication

2017-02-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7bf09433f -> 9bf4e2baa [SPARK-19497][SS] Implement streaming deduplication ## What changes were proposed in this pull request? This PR adds a special streaming deduplication operator to support `dropDuplicates` with `aggregation` and

spark git commit: [SPARK-19584][SS][DOCS] update structured streaming documentation around batch mode

2017-02-14 Thread tdas
tch query specification and options. zsxwing tdas Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Closes #16918 from tcondie/kafka-docs. (cherry picked from commit 447b2b5309251f3ae37857de73c157e59a0d76df) S

spark git commit: [SPARK-19584][SS][DOCS] update structured streaming documentation around batch mode

2017-02-14 Thread tdas
tch query specification and options. zsxwing tdas Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Closes #16918 from tcondie/kafka-docs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger

2017-01-31 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 e43f161bb -> d35a1268d [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger In StructuredStreaming, if a new trigger was skipped because no new data arrived, we suddenly

spark git commit: [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger

2017-01-31 Thread tdas
Repository: spark Updated Branches: refs/heads/master 57d70d26c -> 081b7adda [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger ## What changes were proposed in this pull request? In StructuredStreaming, if a new trigger was

spark git commit: [SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD

2017-01-25 Thread tdas
gic. ## How was this patch tested? New unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15396 from tdas/SPARK-14804. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47d5d0dd Tree: http://git-wip-us.a

spark git commit: [SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD

2017-01-25 Thread tdas
gic. ## How was this patch tested? New unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15396 from tdas/SPARK-14804. (cherry picked from commit 47d5d0ddb06c7d2c86515d9556c41dc80081f560) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project:

spark git commit: [SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD

2017-01-25 Thread tdas
gic. ## How was this patch tested? New unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15396 from tdas/SPARK-14804. (cherry picked from commit 47d5d0ddb06c7d2c86515d9556c41dc80081f560) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project:

spark git commit: [SPARK-16101][HOTFIX] Fix the build with Scala 2.10 by explicit typed argument

2017-01-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master 60bd91a34 -> ec9493b44 [SPARK-16101][HOTFIX] Fix the build with Scala 2.10 by explicit typed argument ## What changes were proposed in this pull request? I goofed in https://github.com/apache/spark/pull/16669 which introduces the break

spark git commit: [SPARK-19267][SS] Fix a race condition when stopping StateStore

2017-01-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 4d286c903 -> 6f0ad575d [SPARK-19267][SS] Fix a race condition when stopping StateStore ## What changes were proposed in this pull request? There is a race condition when stopping StateStore which makes `StateStoreSuite.maintenance`

spark git commit: [SPARK-19267][SS] Fix a race condition when stopping StateStore

2017-01-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9b7a03f15 -> ea31f92bb [SPARK-19267][SS] Fix a race condition when stopping StateStore ## What changes were proposed in this pull request? There is a race condition when stopping StateStore which makes `StateStoreSuite.maintenance`

spark git commit: [SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in Structured Streaming plan

2017-01-20 Thread tdas
ter a aggregation in complete mode. Currently it is incorrectly allowed when present anywhere in the plan. It gives unpredictable potentially incorrect results. ## How was this patch tested? New test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16662 from tdas/SPARK-19314. (cherry pi

spark git commit: [SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in Structured Streaming plan

2017-01-20 Thread tdas
ter a aggregation in complete mode. Currently it is incorrectly allowed when present anywhere in the plan. It gives unpredictable potentially incorrect results. ## How was this patch tested? New test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16662 from tdas/SPARK-19314. (cherry pi

spark git commit: [SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in Structured Streaming plan

2017-01-20 Thread tdas
ion in complete mode. Currently it is incorrectly allowed when present anywhere in the plan. It gives unpredictable potentially incorrect results. ## How was this patch tested? New test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16662 from tdas/SPARK-19314. Project: h

spark git commit: [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options

2017-01-06 Thread tdas
ntent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png) ![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png) Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16468 from tdas/SPARK-19074. (cherry pic

spark git commit: [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options

2017-01-06 Thread tdas
om/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png) ![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png) Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16468 from tdas/SPARK-19074. Project: http://git-wi

spark git commit: [FLAKY-TEST] InputStreamsSuite.socket input stream

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 021952d58 -> 9a3c5bd70 [FLAKY-TEST] InputStreamsSuite.socket input stream ## What changes were proposed in this pull request?

spark git commit: [FLAKY-TEST] InputStreamsSuite.socket input stream

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7e8994ffd -> afe36516e [FLAKY-TEST] InputStreamsSuite.socket input stream ## What changes were proposed in this pull request?

spark git commit: [SPARK-18234][SS] Made update mode public

2016-12-21 Thread tdas
ot; - Changed package of InternalOutputModes from o.a.s.sql to o.a.s.sql.catalyst - Added update mode state removing with watermark to StateStoreSaveExec ## How was this patch tested? Added new tests in changed modules Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16360 fr

spark git commit: [SPARK-18234][SS] Made update mode public

2016-12-21 Thread tdas
ot; - Changed package of InternalOutputModes from o.a.s.sql to o.a.s.sql.catalyst - Added update mode state removing with watermark to StateStoreSaveExec ## How was this patch tested? Added new tests in changed modules Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16360 fr

spark git commit: [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 0e51bb085 -> 17ef57fe8 [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test ## What changes were proposed in this pull request? When KafkaSource fails on Kafka errors, we should create a new

spark git commit: [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 354e93618 -> 95efc895e [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test ## What changes were proposed in this pull request? When KafkaSource fails on Kafka errors, we should create a new

spark git commit: [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 3c8861d92 -> 162bdb910 [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality ## What changes were proposed in this pull request? The failure is because in `test("basic functionality")`, it doesn't

spark git commit: [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 607a1e63d -> ccfe60a83 [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality ## What changes were proposed in this pull request? The failure is because in `test("basic functionality")`, it doesn't block

spark git commit: [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance

2016-12-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 047a9d92c -> b2dd8ec6b [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance ## What changes were proposed in this pull request? It was pretty flaky before 10 days ago.

spark git commit: [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance

2016-12-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 3857d5ba8 -> 063a98e52 [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance ## What changes were proposed in this pull request? It was pretty flaky before 10 days ago.

spark git commit: [SPARK-18850][SS] Make StreamExecution and progress classes serializable

2016-12-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 d8548c8a7 -> a73201daf [SPARK-18850][SS] Make StreamExecution and progress classes serializable ## What changes were proposed in this pull request? This PR adds StreamingQueryWrapper to make StreamExecution and progress classes

spark git commit: [SPARK-18850][SS] Make StreamExecution and progress classes serializable

2016-12-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master 78062b852 -> d7f3058e1 [SPARK-18850][SS] Make StreamExecution and progress classes serializable ## What changes were proposed in this pull request? This PR adds StreamingQueryWrapper to make StreamExecution and progress classes

spark git commit: [SPARK-18888] partitionBy in DataStreamWriter in Python throws _to_seq not defined

2016-12-15 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 900ce558a -> b6a81f472 [SPARK-1] partitionBy in DataStreamWriter in Python throws _to_seq not defined ## What changes were proposed in this pull request? `_to_seq` wasn't imported. ## How was this patch tested? Added

spark git commit: [SPARK-18888] partitionBy in DataStreamWriter in Python throws _to_seq not defined

2016-12-15 Thread tdas
Repository: spark Updated Branches: refs/heads/master 68a6dc974 -> 0917c8ee0 [SPARK-1] partitionBy in DataStreamWriter in Python throws _to_seq not defined ## What changes were proposed in this pull request? `_to_seq` wasn't imported. ## How was this patch tested? Added partitionBy to

spark git commit: [SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource

2016-12-15 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4f7292c87 -> 68a6dc974 [SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource ## What changes were proposed in this pull request? When starting a stream with a lot of backfill and maxFilesPerTrigger, the user could often want to

spark git commit: [SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource

2016-12-15 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 e430915fa -> 900ce558a [SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource ## What changes were proposed in this pull request? When starting a stream with a lot of backfill and maxFilesPerTrigger, the user could often want

spark git commit: [SPARK-18870] Disallowed Distinct Aggregations on Streaming Datasets

2016-12-15 Thread tdas
ons with isDistinct = true. ## How was this patch tested? Added unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16289 from tdas/SPARK-18870. (cherry picked from commit 4f7292c87512a7da3542998d0e5aa21c27a511e9) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.co

spark git commit: [SPARK-18870] Disallowed Distinct Aggregations on Streaming Datasets

2016-12-15 Thread tdas
ons with isDistinct = true. ## How was this patch tested? Added unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16289 from tdas/SPARK-18870. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4f7292c8 T

spark git commit: [SPARK-18834][SS] Expose event time stats through StreamingQueryProgress

2016-12-13 Thread tdas
QueryProgress.triggerTimestamp` to differentiate from `queryTimestamps`. It has the timestamp of when the trigger was started. ## How was this patch tested? Updated tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16258 from tdas/SPARK-18834. (cherry picked from commit c68fb42

spark git commit: [SPARK-18834][SS] Expose event time stats through StreamingQueryProgress

2016-12-13 Thread tdas
QueryProgress.triggerTimestamp` to differentiate from `queryTimestamps`. It has the timestamp of when the trigger was started. ## How was this patch tested? Updated tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16258 from tdas/SPARK-18834. Project: http://git-wip-us.apache.or

spark git commit: [SPARK-18796][SS] StreamingQueryManager should not block when starting a query

2016-12-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 1aeb7f427 -> 9dc5fa5f7 [SPARK-18796][SS] StreamingQueryManager should not block when starting a query ## What changes were proposed in this pull request? Major change in this PR: - Add `pendingQueryNames` and `pendingQueryIds` to

spark git commit: [SPARK-18796][SS] StreamingQueryManager should not block when starting a query

2016-12-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master bc59951ba -> 417e45c58 [SPARK-18796][SS] StreamingQueryManager should not block when starting a query ## What changes were proposed in this pull request? Major change in this PR: - Add `pendingQueryNames` and `pendingQueryIds` to track

spark git commit: [SPARK-18776][SS] Make Offset for FileStreamSource corrected formatted in json

2016-12-08 Thread tdas
ted unit test. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16205 from tdas/SPARK-18776. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/458fa332 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

spark git commit: [SPARK-18776][SS] Make Offset for FileStreamSource corrected formatted in json

2016-12-08 Thread tdas
ted? Updated unit test. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16205 from tdas/SPARK-18776. (cherry picked from commit 458fa3325e5f8c21c50e406ac8059d6236f93a9c) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-18758][SS] StreamingQueryListener events from a StreamingQuery should be sent only to the listeners in the same session as the query

2016-12-07 Thread tdas
ted? Updated test harness code to use the correct session, and added new unit test. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16186 from tdas/SPARK-18758. (cherry picked from commit 9ab725eabbb4ad515a663b395bd2f91bb5853a23) Signed-off-by: Tathagata Das <tathagata.das1.

spark git commit: [SPARK-18758][SS] StreamingQueryListener events from a StreamingQuery should be sent only to the listeners in the same session as the query

2016-12-07 Thread tdas
ted? Updated test harness code to use the correct session, and added new unit test. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16186 from tdas/SPARK-18758. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9ab7

spark git commit: [SPARK-18754][SS] Rename recentProgresses to recentProgress

2016-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 e9b3afac9 -> 1c6419718 [SPARK-18754][SS] Rename recentProgresses to recentProgress Based on an informal survey, users find this option easier to understand / remember. Author: Michael Armbrust Closes #16182

spark git commit: [SPARK-18754][SS] Rename recentProgresses to recentProgress

2016-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master edc87e189 -> 70b2bf717 [SPARK-18754][SS] Rename recentProgresses to recentProgress Based on an informal survey, users find this option easier to understand / remember. Author: Michael Armbrust Closes #16182 from

spark git commit: [SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDataLossSuite

2016-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master bb94f61a7 -> edc87e189 [SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDataLossSuite ## What changes were proposed in this pull request? Fixed the following failures: ```

spark git commit: [SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDataLossSuite

2016-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 76e1f1651 -> e9b3afac9 [SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDataLossSuite ## What changes were proposed in this pull request? Fixed the following failures: ```

spark git commit: [SPARK-18671][SS][TEST-MAVEN] Follow up PR to fix test for Maven

2016-12-06 Thread tdas
a-0-10-sql tests. So moved the kafka-source-offset-version-2.1.0 from sql test resources to kafka-0-10-sql test resources. ## How was this patch tested? Manually ran maven test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16183 from tdas/SPARK-18671-1. (cherry picked fr

spark git commit: [SPARK-18721][SS] Fix ForeachSink with watermark + append

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master b8c7b8d31 -> 7863c6237 [SPARK-18721][SS] Fix ForeachSink with watermark + append ## What changes were proposed in this pull request? Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and

spark git commit: [SPARK-18722][SS] Move no data rate limit from StreamExecution to ProgressReporter

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 1946854ab -> d4588165e [SPARK-18722][SS] Move no data rate limit from StreamExecution to ProgressReporter ## What changes were proposed in this pull request? Move no data rate limit from StreamExecution to ProgressReporter to make

spark git commit: [SPARK-18722][SS] Move no data rate limit from StreamExecution to ProgressReporter

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 508de38c9 -> 4af142f55 [SPARK-18722][SS] Move no data rate limit from StreamExecution to ProgressReporter ## What changes were proposed in this pull request? Move no data rate limit from StreamExecution to ProgressReporter to make

spark git commit: [SPARK-18657][SPARK-18668] Make StreamingQuery.id persists across restart and not auto-generate StreamingQuery.name

2016-12-05 Thread tdas
Test handling of name=null in json generation of StreamingQueryProgress - [x] Test handling of name=null in json generation of StreamingQueryListener events - [x] Test python API of runId Updated unit tests and new unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16113 fr

spark git commit: [SPARK-18657][SPARK-18668] Make StreamingQuery.id persists across restart and not auto-generate StreamingQuery.name

2016-12-05 Thread tdas
tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16113 from tdas/SPARK-18657. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bb57bfe9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bb57bfe9 D

spark git commit: [SPARK-18729][SS] Move DataFrame.collect out of synchronized block in MemorySink

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 fecd23d2c -> 6c4c33684 [SPARK-18729][SS] Move DataFrame.collect out of synchronized block in MemorySink ## What changes were proposed in this pull request? Move DataFrame.collect out of synchronized block so that we can query content

spark git commit: [SPARK-18729][SS] Move DataFrame.collect out of synchronized block in MemorySink

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3ba69b648 -> 1b2785c3d [SPARK-18729][SS] Move DataFrame.collect out of synchronized block in MemorySink ## What changes were proposed in this pull request? Move DataFrame.collect out of synchronized block so that we can query content in

spark git commit: [SPARK-18694][SS] Add StreamingQuery.explain and exception to Python and fix StreamingQueryException

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 410b78986 -> 246012859 [SPARK-18694][SS] Add StreamingQuery.explain and exception to Python and fix StreamingQueryException ## What changes were proposed in this pull request? - Add StreamingQuery.explain and exception to Python. - Fix

spark git commit: [SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamProgressEvent when there is no data

2016-12-02 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 f915f8128 -> f53763275 [SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamProgressEvent when there is no data ## What changes were proposed in this pull request? This PR adds a sql conf

spark git commit: [SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamProgressEvent when there is no data

2016-12-02 Thread tdas
Repository: spark Updated Branches: refs/heads/master a985dd8e9 -> 56a503df5 [SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamProgressEvent when there is no data ## What changes were proposed in this pull request? This PR adds a sql conf

spark git commit: [SPARK-18516][STRUCTURED STREAMING] Follow up PR to add StreamingQuery.status to Python

2016-11-29 Thread tdas
oid unnecessarily exposing implicit object StreamingQueryStatus, consistent with StreamingQueryProgress) - Add StreamingQuery.status to Python - Fix post-termination status ## How was this patch tested? New unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16075 from tdas/SPAR

spark git commit: [SPARK-18516][STRUCTURED STREAMING] Follow up PR to add StreamingQuery.status to Python

2016-11-29 Thread tdas
oid unnecessarily exposing implicit object StreamingQueryStatus, consistent with StreamingQueryProgress) - Add StreamingQuery.status to Python - Fix post-termination status ## How was this patch tested? New unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16075 from t

spark git commit: [SPARK-18339][SPARK-18513][SQL] Don't push down current_timestamp for filters in StructuredStreaming and persist batch and watermark timestamps to offset log.

2016-11-28 Thread tdas
o the offset log so that these values are consistent across multiple executions of the same batch. brkyvz zsxwing tdas ## How was this patch tested? A test was added to StreamingAggregationSuite ensuring the above use case is handled. The test injects a stream of time values (in seconds) to a

<    1   2   3   4   5   6   7   8   9   >