[1/2] spark git commit: [SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream and writeStream for streaming DFs

2016-06-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5d50d4f0f -> 214adb14b http://git-wip-us.apache.org/repos/asf/spark/blob/214adb14/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala -- diff --git

[1/2] spark git commit: [SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream and writeStream for streaming DFs

2016-06-14 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 9adba414c -> 96274d73e http://git-wip-us.apache.org/repos/asf/spark/blob/96274d73/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala -- diff -

[2/2] spark git commit: [SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream and writeStream for streaming DFs

2016-06-14 Thread tdas
for DataFrameReader/Writer and DataStreamReader/Writer. Author: Tathagata Das Closes #13653 from tdas/SPARK-15933. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/214adb14 Tree: http://git-wip-us.apache.org/repos/asf/spark

spark git commit: [SPARK-15935][PYSPARK] Enable test for sql/streaming.py and fix these tests

2016-06-14 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 cf52375b9 -> d59859d38 [SPARK-15935][PYSPARK] Enable test for sql/streaming.py and fix these tests ## What changes were proposed in this pull request? This PR just enables tests for sql/streaming.py and also fixes the failures. ## How

spark git commit: [SPARK-15935][PYSPARK] Enable test for sql/streaming.py and fix these tests

2016-06-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master a87a56f5c -> 96c3500c6 [SPARK-15935][PYSPARK] Enable test for sql/streaming.py and fix these tests ## What changes were proposed in this pull request? This PR just enables tests for sql/streaming.py and also fixes the failures. ## How was

spark git commit: [HOTFIX][MINOR][SQL] Revert " Standardize 'continuous queries' to 'streaming D…

2016-06-13 Thread tdas
.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.0-compile-maven-hadoop-2.3/326/console Author: Tathagata Das Closes #13645 from tdas/build-break. (cherry picked from commit a6a18a4573515e76d78534f1a19fcc2c3819f6c5) Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repo

spark git commit: [HOTFIX][MINOR][SQL] Revert " Standardize 'continuous queries' to 'streaming D…

2016-06-13 Thread tdas
jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.0-compile-maven-hadoop-2.3/326/console Author: Tathagata Das Closes #13645 from tdas/build-break. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6a18a45 Tree: http://git-

spark git commit: [MINOR][SQL] Standardize 'continuous queries' to 'streaming Datasets/DataFrames'

2016-06-13 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 413826d40 -> bd27dc489 [MINOR][SQL] Standardize 'continuous queries' to 'streaming Datasets/DataFrames' ## What changes were proposed in this pull request? This patch does some replacing (as `streaming Datasets/DataFrames` is the term

spark git commit: [MINOR][SQL] Standardize 'continuous queries' to 'streaming Datasets/DataFrames'

2016-06-13 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4134653e5 -> d32e22778 [MINOR][SQL] Standardize 'continuous queries' to 'streaming Datasets/DataFrames' ## What changes were proposed in this pull request? This patch does some replacing (as `streaming Datasets/DataFrames` is the term we'

spark git commit: [SPARK-15812][SQ][STREAMING] Added support for sorting after streaming aggregation with complete mode

2016-06-10 Thread tdas
mon useful functionality. Support for other operations will come later. ## How was this patch tested? Additional unit tests. Author: Tathagata Das Closes #13549 from tdas/SPARK-15812. (cherry picked from commit abdb5d42c5802c8f60876aa1285c803d02881258) Signed-off-by: Tathagata Das Proj

spark git commit: [SPARK-15812][SQ][STREAMING] Added support for sorting after streaming aggregation with complete mode

2016-06-10 Thread tdas
mon useful functionality. Support for other operations will come later. ## How was this patch tested? Additional unit tests. Author: Tathagata Das Closes #13549 from tdas/SPARK-15812. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/sp

spark git commit: [SPARK-15593][SQL] Add DataFrameWriter.foreach to allow the user consuming data in ContinuousQuery

2016-06-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 1371d5ece -> 02ed7b536 [SPARK-15593][SQL] Add DataFrameWriter.foreach to allow the user consuming data in ContinuousQuery ## What changes were proposed in this pull request? * Add DataFrameWriter.foreach to allow the user consuming da

spark git commit: [SPARK-15593][SQL] Add DataFrameWriter.foreach to allow the user consuming data in ContinuousQuery

2016-06-10 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5a3533e77 -> 00c310133 [SPARK-15593][SQL] Add DataFrameWriter.foreach to allow the user consuming data in ContinuousQuery ## What changes were proposed in this pull request? * Add DataFrameWriter.foreach to allow the user consuming data i

spark git commit: [SPARK-15853][SQL] HDFSMetadataLog.get should close the input stream

2016-06-09 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 00bbf7873 -> ca0801120 [SPARK-15853][SQL] HDFSMetadataLog.get should close the input stream ## What changes were proposed in this pull request? This PR closes the input stream created in `HDFSMetadataLog.get` ## How was this patch tes

spark git commit: [SPARK-15853][SQL] HDFSMetadataLog.get should close the input stream

2016-06-09 Thread tdas
Repository: spark Updated Branches: refs/heads/master b914e1930 -> 4d9d9cc58 [SPARK-15853][SQL] HDFSMetadataLog.get should close the input stream ## What changes were proposed in this pull request? This PR closes the input stream created in `HDFSMetadataLog.get` ## How was this patch tested?

spark git commit: [SPARK-15580][SQL] Add ContinuousQueryInfo to make ContinuousQueryListener events serializable

2016-06-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 ec556fec0 -> 003c44792 [SPARK-15580][SQL] Add ContinuousQueryInfo to make ContinuousQueryListener events serializable ## What changes were proposed in this pull request? This PR adds ContinuousQueryInfo to make ContinuousQueryListener

spark git commit: [SPARK-15580][SQL] Add ContinuousQueryInfo to make ContinuousQueryListener events serializable

2016-06-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master 695dbc816 -> 0cfd6192f [SPARK-15580][SQL] Add ContinuousQueryInfo to make ContinuousQueryListener events serializable ## What changes were proposed in this pull request? This PR adds ContinuousQueryInfo to make ContinuousQueryListener eve

spark git commit: [SPARK-15458][SQL][STREAMING] Disable schema inference for streaming datasets on file streams

2016-05-24 Thread tdas
e a SQLConf that determines whether schema inference for file streams is allowed or not. It is disabled by default. ## How was this patch tested? Updated unit tests that test error behavior with and without schema inference enabled. Author: Tathagata Das Closes #13238 from tdas/SPARK-15458. (che

spark git commit: [SPARK-15458][SQL][STREAMING] Disable schema inference for streaming datasets on file streams

2016-05-24 Thread tdas
onf that determines whether schema inference for file streams is allowed or not. It is disabled by default. ## How was this patch tested? Updated unit tests that test error behavior with and without schema inference enabled. Author: Tathagata Das Closes #13238 from tdas/SPARK-15458. Proj

spark git commit: [SPARK-15428][SQL] Disable multiple streaming aggregations

2016-05-22 Thread tdas
the necessary support for "delta" to implement correctly. So disabling the support for multiple streaming aggregations. ## How was this patch tested? Additional unit tests Author: Tathagata Das Closes #13210 from tdas/SPARK-15428. (cherry picked from commit 1ffa608ba5a849739a56047bda8

spark git commit: [SPARK-15428][SQL] Disable multiple streaming aggregations

2016-05-22 Thread tdas
ary support for "delta" to implement correctly. So disabling the support for multiple streaming aggregations. ## How was this patch tested? Additional unit tests Author: Tathagata Das Closes #13210 from tdas/SPARK-15428. Project: http://git-wip-us.apache.org/repos/asf/spark/repo

spark git commit: [SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCatalog to infer partitioning

2016-05-04 Thread tdas
ing gets inferred, and on reading whether the partitions get pruned correctly based on the query. - Other unit tests are unchanged and pass as expected. Author: Tathagata Das Closes #12879 from tdas/SPARK-15103. (cherry picked from commit 0fd3a4748416233f034ec137d95f0a4c8712d396) Signed-off

spark git commit: [SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCatalog to infer partitioning

2016-05-04 Thread tdas
ets inferred, and on reading whether the partitions get pruned correctly based on the query. - Other unit tests are unchanged and pass as expected. Author: Tathagata Das Closes #12879 from tdas/SPARK-15103. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-

spark git commit: [SPARK-14860][TESTS] Create a new Waiter in reset to bypass an issue of ScalaTest's Waiter.wait

2016-05-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4ad492c40 -> b545d7521 [SPARK-14860][TESTS] Create a new Waiter in reset to bypass an issue of ScalaTest's Waiter.wait ## What changes were proposed in this pull request? This PR updates `QueryStatusCollector.reset` to create Waiter inste

spark git commit: [SPARK-14860][TESTS] Create a new Waiter in reset to bypass an issue of ScalaTest's Waiter.wait

2016-05-03 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 c5b7e1f70 -> 31e5a2a76 [SPARK-14860][TESTS] Create a new Waiter in reset to bypass an issue of ScalaTest's Waiter.wait ## What changes were proposed in this pull request? This PR updates `QueryStatusCollector.reset` to create Waiter i

spark git commit: [SPARK-14716][SQL] Added support for partitioning in FileStreamSink

2016-05-03 Thread tdas
e PR). - Updated FileStressSuite to test number of records read from partitioned output files. Author: Tathagata Das Closes #12409 from tdas/streaming-partitioned-parquet. (cherry picked from commit 4ad492c40358d0104db508db98ce0971114b6817) Signed-off-by: Tathagata Das Project: http://git-wip-us.apac

spark git commit: [SPARK-14716][SQL] Added support for partitioning in FileStreamSink

2016-05-03 Thread tdas
FileStressSuite to test number of records read from partitioned output files. Author: Tathagata Das Closes #12409 from tdas/streaming-partitioned-parquet. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4ad492c4 Tree

spark git commit: [SPARK-14555] Second cut of Python API for Structured Streaming

2016-04-28 Thread tdas
Repository: spark Updated Branches: refs/heads/master d584a2b8a -> 78c8aaf84 [SPARK-14555] Second cut of Python API for Structured Streaming ## What changes were proposed in this pull request? This PR adds Python APIs for: - `ContinuousQueryManager` - `ContinuousQueryException` The `Contin

spark git commit: [SPARK-14930][SPARK-13693] Fix race condition in CheckpointWriter.stop()

2016-04-27 Thread tdas
project's tests as a whole, they now run in ~220 seconds vs. ~354 before. /cc zsxwing and tdas for review. Author: Josh Rosen Closes #12712 from JoshRosen/fix-checkpoint-writer-race. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/re

spark git commit: [SPARK-14719] WriteAheadLogBasedBlockHandler should ignore BlockManager put errors

2016-04-18 Thread tdas
AND_DISK` storage level, and 2. typically, individual blocks may be small enough relative to the total storage memory such that they're able to evict blocks from previous batches, so `put()` failures here may be rare in practice. This patch fixes the faulty test and fixes the bug. /cc tdas Autho

spark git commit: [SPARK-14382][SQL] QueryProgress should be post after committedOffsets is updated

2016-04-06 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9c6556c5f -> a4ead6d38 [SPARK-14382][SQL] QueryProgress should be post after committedOffsets is updated ## What changes were proposed in this pull request? Make sure QueryProgress is post after committedOffsets is updated. If QueryProgr

spark git commit: [SPARK-14109][SQL] Fix HDFSMetadataLog to fallback from FileContext to FileSystem API

2016-03-25 Thread tdas
log directory is concurrently modified. In addition I have also added more tests to increase the code coverage. ## How was this patch tested? Unit test. Tested on cluster with custom file system. Author: Tathagata Das Closes #11925 from tdas/SPARK-14109. Project: http://git-wip-us.apache.

spark git commit: [SPARK-13809][SQL] State store for streaming aggregations

2016-03-23 Thread tdas
cations are set correctly - [ ] Whether recovery works correctly with distributed storage - [x] Basic performance tests - [x] Docs Author: Tathagata Das Closes #11645 from tdas/state-store. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repo

spark git commit: [HOTFIX][SQL] Don't stop ContinuousQuery in quietly

2016-03-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master 926a93e54 -> abacf5f25 [HOTFIX][SQL] Don't stop ContinuousQuery in quietly ## What changes were proposed in this pull request? Try to fix a flaky hang ## How was this patch tested? Existing Jenkins test Author: Shixiong Zhu Closes #11

spark git commit: [SPARK-13149][SQL] Add FileStreamSource

2016-02-09 Thread tdas
Repository: spark Updated Branches: refs/heads/master 6f710f9fd -> b385ce388 [SPARK-13149][SQL] Add FileStreamSource `FileStreamSource` is an implementation of `org.apache.spark.sql.execution.streaming.Source`. It takes advantage of the existing `HadoopFsRelationProvider` to support various

spark git commit: [SPARK-7799][STREAMING][DOCUMENT] Add the linking and deploying instructions for streaming-akka project

2016-01-26 Thread tdas
Repository: spark Updated Branches: refs/heads/master 08c781ca6 -> cbd507d69 [SPARK-7799][STREAMING][DOCUMENT] Add the linking and deploying instructions for streaming-akka project Since `actorStream` is an external project, we should add the linking and deploying instructions for it. A fol

spark git commit: [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project

2016-01-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 944fdadf7 -> b7d74a602 [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove "S

spark git commit: [SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all Streaming events to the same thread as Spark events

2016-01-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master e3727c409 -> 944fdadf7 [SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all Streaming events to the same thread as Spark events Including the following changes: 1. Add StreamingListenerForwardingBus to WrappedStreamingL

spark git commit: [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc

2016-01-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4bcea1b85 -> 721845c1b [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc. Author:

spark git commit: [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc

2016-01-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 7482c7b5a -> d43704d7f [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc. Auth

spark git commit: [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc

2016-01-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 8c2b67f55 -> 7482c7b5a [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author: Sh

spark git commit: [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc

2016-01-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 404190221 -> a973f483f [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author: Shixio

spark git commit: [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (branch 1.6)

2016-01-08 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 a7c36362f -> 0d96c5453 [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (branch 1.6) backport #10609 to branch 1.6 Author: Shixiong Zhu Closes #10656 from zsxwing/SPARK-12591-branch-1.6. Project: http://git-wip-u

spark git commit: [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo

2016-01-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master c94199e97 -> 28e0e500a [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should regist

spark git commit: [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming

2016-01-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 6ef823544 -> a7c36362f [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu Closes #10453 from zsxwing/streaming-conf. (cherry pic

spark git commit: [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming

2016-01-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5a4021998 -> c94199e97 [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu Closes #10453 from zsxwing/streaming-conf. Project: http://

spark git commit: [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming

2015-12-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master 93db50d1c -> 20591afd7 [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing

spark git commit: [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming

2015-12-22 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 94fb5e870 -> 942c0577b [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpoin

spark git commit: [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler

2015-12-22 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 0f905d7df -> 94fb5e870 [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler Author: Shixiong Zhu Closes #10439 from zsxwing/kafka-message-handler-doc. (cherry picked from commit 93db50d1c2ff97e6eb9200a995e4601f752968

spark git commit: [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler

2015-12-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master b374a2583 -> 93db50d1c [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler Author: Shixiong Zhu Closes #10439 from zsxwing/kafka-message-handler-doc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present

2015-12-07 Thread tdas
ough RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das Closes #9988 from tdas/SPARK-11932. (cherry picked from commit 5d80d8c6a54b2113022eff31187e6d97

spark git commit: [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present

2015-12-07 Thread tdas
ckpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das Closes #9988 from tdas/SPARK-11932. Project: http://git-wip-us.apache.org/repos/asf/spark/rep

spark git commit: [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when Jenkins load is high

2015-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 82a71aba0 -> c54b698ec [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when Jenkins load is high We need to make sure that the last entry is indeed the last entry in the queue. Author: Burak Yavuz Closes #1011

spark git commit: [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when Jenkins load is high

2015-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master 80a824d36 -> 6fd9e70e3 [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when Jenkins load is high We need to make sure that the last entry is indeed the last entry in the queue. Author: Burak Yavuz Closes #10110 fr

spark git commit: [SPARK-12058][STREAMING][KINESIS][TESTS] fix Kinesis python tests

2015-12-04 Thread tdas
the KPL. cc zsxwing tdas Author: Burak Yavuz Closes #10050 from brkyvz/kinesis-py. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/39d5cc8a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/39d5cc8a Diff: http:/

spark git commit: [SPARK-12058][STREAMING][KINESIS][TESTS] fix Kinesis python tests

2015-12-04 Thread tdas
KPL. cc zsxwing tdas Author: Burak Yavuz Closes #10050 from brkyvz/kinesis-py. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/302d68de Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/302d68de Diff: http:/

spark git commit: [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint

2015-12-04 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5011f264f -> 4106d80fb [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint Author: Tathagata Das Closes #10127 from tdas/SPARK-12122. Project: http://git-

spark git commit: [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint

2015-12-04 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 2d7c4f6af -> 8f784b864 [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint Author: Tathagata Das Closes #10127 from tdas/SPARK-12122. (cherry picked from com

spark git commit: [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test

2015-12-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master ad7cea6f7 -> a02d47277 [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test Author: Tathagata Das Closes #10124 from tdas/InputStreamSuite-flaky-test. Project: http://git-wip-us.apache.org/repos/

spark git commit: [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test

2015-12-03 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 9d698fc57 -> b1a27d616 [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test Author: Tathagata Das Closes #10124 from tdas/InputStreamSuite-flaky-test. (cherry picked from com

spark git commit: [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java

2015-11-25 Thread tdas
Repository: spark Updated Branches: refs/heads/master 88875d941 -> d29e2ef4c [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java The Python exception track in TransformFunction and TransformFunctionSerializer is not sent back to Java

spark git commit: [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java

2015-11-25 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 b4cf318ab -> 849ddb6ae [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java The Python exception track in TransformFunction and TransformFunctionSerializer is not sent back to

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 5118abb4e -> 94789f374 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9ed4ad426 -> be7a2cfd9 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception happ

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 9c8e17984 -> 0c23dd52d [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 9a906c1c3 -> e9ae1fda9 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.3 5278ef0f1 -> 387d81891 [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for true

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 eda1ff4ee -> 5118abb4e [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for true

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 fdffc400c -> abe393024 [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for true

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 9957925e4 -> 001c44667 [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for true

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/master 470007453 -> 599a8c6e2 [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for trueness

spark git commit: [SPARK-11791] Fix flaky test in BatchedWriteAheadLogSuite

2015-11-18 Thread tdas
Another solution would be to implement a custom mockito matcher that sorts and then compares the results, but that kind of sounds like overkill to me. Let me know what you think tdas zsxwing Author: Burak Yavuz Closes #9790 from brkyvz/fix-flaky-2. (cherry picked fr

spark git commit: [SPARK-11791] Fix flaky test in BatchedWriteAheadLogSuite

2015-11-18 Thread tdas
solution would be to implement a custom mockito matcher that sorts and then compares the results, but that kind of sounds like overkill to me. Let me know what you think tdas zsxwing Author: Burak Yavuz Closes #9790 from brkyvz/fix-flaky-2. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-11814][STREAMING] Add better default checkpoint duration

2015-11-18 Thread tdas
interval = batch interval, and RDDs get checkpointed every batch. This PR is to set the checkpoint interval of trackStateByKey to 10 * batch duration. Author: Tathagata Das Closes #9805 from tdas/SPARK-11814. (cherry picked from commit a402c92c92b2e1c85d264f6077aec8f6d6a08270) Signed-off-by: T

spark git commit: [SPARK-11814][STREAMING] Add better default checkpoint duration

2015-11-18 Thread tdas
interval = batch interval, and RDDs get checkpointed every batch. This PR is to set the checkpoint interval of trackStateByKey to 10 * batch duration. Author: Tathagata Das Closes #9805 from tdas/SPARK-11814. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.ap

spark git commit: [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...>

2015-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 899106cc6 -> c130b8626 [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...> Currently streaming foreachRDD Java API uses a function prototype requiring a return value of null. This PR d

spark git commit: [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...>

2015-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 94624eacb -> 31921e0f0 [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...> Currently streaming foreachRDD Java API uses a function prototype requiring a return value of null. This PR depre

spark git commit: [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener bus's thread

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 c13f72316 -> 737f07172 [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener bus's thread See discussion toward the tail of https://github.com/apache/spark/pull/9723 >From zsxwing : ``` The user should not call stop

spark git commit: [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener bus's thread

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 8fb775ba8 -> 446738e51 [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener bus's thread See discussion toward the tail of https://github.com/apache/spark/pull/9723 >From zsxwing : ``` The user should not call stop or

spark git commit: [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 3133d8bd1 -> a7fcc3117 [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API Fixed the merge conflicts in #7410 Closes #7410 Author: Shixiong Zhu Author: jerryshao Author: jerryshao Closes #9742 from zsxwing/pr7

spark git commit: [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master b362d50fc -> 75a292291 [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API Fixed the merge conflicts in #7410 Closes #7410 Author: Shixiong Zhu Author: jerryshao Author: jerryshao Closes #9742 from zsxwing/pr7410.

spark git commit: [HOTFIX][STREAMING] Add mockito to fix the compilation error

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 e26dc9642 -> f33e277f9 [HOTFIX][STREAMING] Add mockito to fix the compilation error Added mockito to the test scope to fix the compilation error in branch 1.5 Author: Shixiong Zhu Closes #9782 from zsxwing/1.5-hotfix. Project: htt

spark git commit: [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 bdcbbdac6 -> e26dc9642 [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch in

spark git commit: [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 1a5dfb706 -> fa9d56f9e [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch in

spark git commit: [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 936bc0bcb -> 928d63162 [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch interv

spark git commit: [SPARK-11742][STREAMING] Add the failure info to the batch lists

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3c025087b -> bcea0bfda [SPARK-11742][STREAMING] Add the failure info to the batch lists https://cloud.githubusercontent.com/assets/1000778/11162322/9b88e204-8a51-11e5-8c57-a44889cab713.png";> Author: Shixiong Zhu Closes #9711 from zsxwin

spark git commit: [SPARK-11742][STREAMING] Add the failure info to the batch lists

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 64439f7d6 -> 3bd72eafc [SPARK-11742][STREAMING] Add the failure info to the batch lists https://cloud.githubusercontent.com/assets/1000778/11162322/9b88e204-8a51-11e5-8c57-a44889cab713.png";> Author: Shixiong Zhu Closes #9711 from zs

spark git commit: [SPARK-6328][PYTHON] Python API for StreamingListener

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 38673d7e6 -> c83177d30 [SPARK-6328][PYTHON] Python API for StreamingListener Author: Daniel Jalova Closes #9186 from djalova/SPARK-6328. (cherry picked from commit ace0db47141ffd457c2091751038fc291f6d5a8b) Signed-off-by: Tathagata Da

spark git commit: [SPARK-6328][PYTHON] Python API for StreamingListener

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master de5e531d3 -> ace0db471 [SPARK-6328][PYTHON] Python API for StreamingListener Author: Daniel Jalova Closes #9186 from djalova/SPARK-6328. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/rep

spark git commit: [SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by default

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master b0c3fd34e -> de5e531d3 [SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by default Using batching on the driver for the WriteAheadLog should be an improvement for all environments and use cases. Users will be able to scale

spark git commit: [SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by default

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 f14fb291d -> 38673d7e6 [SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by default Using batching on the driver for the WriteAheadLog should be an improvement for all environments and use cases. Users will be able to sc

spark git commit: [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures

2015-11-13 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 aff44f9a8 -> c3da2bd46 [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures This PR just checks the test results and returns 1 if the test fails, so that `run-tests.py` can mark it fail. Author: Shi

spark git commit: [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures

2015-11-13 Thread tdas
Repository: spark Updated Branches: refs/heads/master ad960885b -> ec80c0c2f [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures This PR just checks the test results and returns 1 if the test fails, so that `run-tests.py` can mark it fail. Author: Shixion

spark git commit: [SPARK-11681][STREAMING] Correctly update state timestamp even when state is not updated

2015-11-12 Thread tdas
is defined as "no data for a while", not "not state update for a while". Fix: Update timestamp when timestamp when timeout is specified, otherwise no need. Also refactored the code for better testability and added unit tests. Author: Tathagata Das Closes #9648 from tda

spark git commit: [SPARK-11681][STREAMING] Correctly update state timestamp even when state is not updated

2015-11-12 Thread tdas
is defined as "no data for a while", not "not state update for a while". Fix: Update timestamp when timestamp when timeout is specified, otherwise no need. Also refactored the code for better testability and added unit tests. Author: Tathagata Das Closes #9648 from tdas/SPARK

spark git commit: [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 05666e09b -> 199e4cb21 [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a ve

spark git commit: [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 0f1d00a90 -> 7786f9cc0 [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very e

spark git commit: [SPARK-11663][STREAMING] Add Java API for trackStateByKey

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 6c1bf19e8 -> 05666e09b [SPARK-11663][STREAMING] Add Java API for trackStateByKey TODO - [x] Add Java API - [x] Add API tests - [x] Add a function test Author: Shixiong Zhu Closes #9636 from zsxwing/java-track. (cherry picked from co

spark git commit: [SPARK-11663][STREAMING] Add Java API for trackStateByKey

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 41bbd2300 -> 0f1d00a90 [SPARK-11663][STREAMING] Add Java API for trackStateByKey TODO - [x] Add Java API - [x] Add API tests - [x] Add a function test Author: Shixiong Zhu Closes #9636 from zsxwing/java-track. Project: http://git-wip-u

spark git commit: [SPARK-11290][STREAMING][TEST-MAVEN] Fix the test for maven build

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 f5c66d163 -> 340ca9e76 [SPARK-11290][STREAMING][TEST-MAVEN] Fix the test for maven build Should not create SparkContext in the constructor of `TrackStateRDDSuite`. This is a follow up PR for #9256 to fix the test for maven build. Auth

<    1   2   3   4   5   6   7   8   9   10   >