[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r180933380 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -53,32 +53,24 @@ class ContinuousSuiteBase extends

[GitHub] spark pull request #21048: [SPARK-23966][SS] Refactoring all checkpoint file...

2018-04-11 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/21048 [SPARK-23966][SS] Refactoring all checkpoint file writing logic in a common CheckpointFileManager interface ## What changes were proposed in this pull request? Checkpoint files (offset log

[GitHub] spark pull request #20983: [SPARK-23747][Structured Streaming] Add EpochCoor...

2018-04-10 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20983#discussion_r180553748 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/EpochCoordinatorSuite.scala --- @@ -0,0 +1,225 @@ +/* + * Licensed

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179596647 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212 @@ +/* --- End

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179603916 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179594450 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala --- @@ -43,8 +45,39 @@ object MemoryStream { protected val

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179603105 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179604298 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179602477 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179601495 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212 @@ +/* --- End

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179617487 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179617590 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179599245 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -53,32 +53,24 @@ class ContinuousSuiteBase extends

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179617149 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179605302 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179618466 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

[GitHub] spark pull request #20828: [SPARK-23687][SS] Add a memory source for continu...

2018-04-05 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20828#discussion_r179606525 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousMemoryStream.scala --- @@ -0,0 +1,212

spark git commit: [SPARK-23099][SS] Migrate foreach sink to DataSourceV2

2018-04-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7cf9fab33 -> 66a3a5a2d [SPARK-23099][SS] Migrate foreach sink to DataSourceV2 ## What changes were proposed in this pull request? Migrate foreach sink to DataSourceV2. Since the previous attempt at this PR #20552, we've changed and

[GitHub] spark issue #20951: [SPARK-23099][SS] Migrate foreach sink to DataSourceV2

2018-04-02 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20951 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20951: [SPARK-23099][SS] Migrate foreach sink to DataSou...

2018-04-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20951#discussion_r178648761 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ForeachSinkSuite.scala --- @@ -141,7 +141,7 @@ class ForeachSinkSuite extends

[GitHub] spark pull request #20951: [SPARK-23099][SS] Migrate foreach sink to DataSou...

2018-04-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20951#discussion_r178648616 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ForeachSinkSuite.scala --- @@ -131,7 +131,7 @@ class ForeachSinkSuite extends

[GitHub] spark pull request #20951: [SPARK-23099][SS] Migrate foreach sink to DataSou...

2018-04-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20951#discussion_r178648434 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ForeachSinkSuite.scala --- @@ -131,7 +131,7 @@ class ForeachSinkSuite extends

[GitHub] spark issue #20958: [SPARK-23844][SS] Fix socket source honors recovered off...

2018-04-02 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20958 We have made it clear that sockets is ONLY for testing and will not recover data from checkpoints. So I see no problem that it throws errors when attempting to recover. May we can improve the error

[GitHub] spark pull request #20958: [SPARK-23844][SS] Fix socket source honors recove...

2018-04-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20958#discussion_r178646725 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -238,6 +238,10 @@ final class DataStreamWriter[T] private[sql

[GitHub] spark pull request #20951: [SPARK-23099][SS] Migrate foreach sink to DataSou...

2018-04-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20951#discussion_r178645775 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala --- @@ -17,52 +17,81 @@ package

[GitHub] spark pull request #20951: [SPARK-23099][SS] Migrate foreach sink to DataSou...

2018-04-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20951#discussion_r178645279 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala --- @@ -17,52 +17,81 @@ package

spark git commit: [SPARK-23827][SS] StreamingJoinExec should ensure that input data is partitioned into specific number of partitions

2018-03-30 Thread tdas
dant. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20941 from tdas/SPARK-23827. (cherry picked from commit 15298b99ac8944e781328423289586176cf824d7) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

[GitHub] spark issue #20941: [SPARK-23827] [SS] StreamingJoinExec should ensure that ...

2018-03-30 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20941 Started more tests to test for flakiness. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20941: [SPARK-23827] [SS] StreamingJoinExec should ensure that ...

2018-03-29 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20941 @brkyvz @zsxwing can one of you take a look? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20941: Spark 23827

2018-03-29 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20941 Spark 23827 ## What changes were proposed in this pull request? Currently, the requiredChildDistribution does not specify the partitions. This can cause the weird corner cases where

spark git commit: [SPARK-23096][SS] Migrate rate source to V2

2018-03-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 35997b59f -> c68ec4e6a [SPARK-23096][SS] Migrate rate source to V2 ## What changes were proposed in this pull request? This PR migrate micro batch rate source to V2 API and rewrite UTs to suite V2 test. ## How was this patch tested?

[GitHub] spark pull request #20906: [SPARK-23561][SS] Pull continuous processing out ...

2018-03-26 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20906#discussion_r177244722 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteExec.scala --- @@ -0,0 +1,122

[GitHub] spark pull request #20906: [SPARK-23561][SS] Pull continuous processing out ...

2018-03-26 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20906#discussion_r177245022 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteExec.scala --- @@ -0,0 +1,122

[GitHub] spark pull request #20906: [SPARK-23561][SS] Pull continuous processing out ...

2018-03-26 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20906#discussion_r177275489 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteExec.scala --- @@ -0,0 +1,122

[GitHub] spark pull request #20906: [SPARK-23561][SS] Pull continuous processing out ...

2018-03-26 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20906#discussion_r177243246 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousWriteExec.scala --- @@ -0,0 +1,122

[GitHub] spark issue #20906: [SPARK-23561][SS] Pull continuous processing out of Writ...

2018-03-26 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20906 jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20859: [SPARK-23702][SS] Forbid watermarks on both sides...

2018-03-19 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20859#discussion_r175580404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -160,6 +160,19 @@ object

[GitHub] spark pull request #20859: [SPARK-23702][SS] Forbid watermarks on both sides...

2018-03-19 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20859#discussion_r175580451 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala --- @@ -140,6 +140,21 @@ class

[GitHub] spark pull request #20859: [SPARK-23702][SS] Forbid watermarks on both sides...

2018-03-19 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20859#discussion_r175579879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -160,6 +160,19 @@ object

[GitHub] spark pull request #20848: [SPARK-23623][SS] Avoid concurrent use of cached ...

2018-03-19 Thread tdas
Github user tdas closed the pull request at: https://github.com/apache/spark/pull/20848 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-16 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20767 Just to be clear, I am not saying that we *have to* move to this pool stuff. I am just saying that if we want to make this more robust, then we should try to use existing tools (after careful

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-16 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20767 @tedyu It was indeed hard to find :) But apache commons pool does expose metrics on idle/active counts. See https://commons.apache.org/proper/commons-pool/apidocs/org/apache/commons/pool2/impl

[GitHub] spark pull request #20848: [SPARK-23623][SS] Avoid concurrent use of cached ...

2018-03-16 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20848 [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer (branch-2.3) This is a backport of #20767 to branch 2.3 ## What changes were proposed in this pull request

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-16 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20767 @tedyu @zsxwing My thoughts on this is that we should consider migrating to something like Apache Common Pool (assuming it does not require additional maven libraries), which might be less maintenance

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20767 The idea is good. But how do you propose exposing that information? Periodic print out in the log? From a different angle, I would rather not do feature creep in this PR

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20767 @tedyu @zsxwing thank you very much for catching the bugs. I have simplified the logic quite a bit. Note that I removed the invariant that I had introduced earlier. Additionally, I locally ran

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-15 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r174973494 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +415,103 @@ private[kafka010] object

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-15 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r174968594 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +415,103 @@ private[kafka010] object

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-15 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r174968294 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +415,103 @@ private[kafka010] object

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-15 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20767 @koeninger good question Cody! I think we should fix this limitation eventually. The only reason I am not doing that in this PR is to keep the changes minimum for backporting to 2.3.x. Eventually, we

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-09 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173602790 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +415,103 @@ private[kafka010] object

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-09 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173602735 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +415,103 @@ private[kafka010] object

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-09 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173602480 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +415,103 @@ private[kafka010] object

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-09 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173602442 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -27,30 +27,73 @@ import

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-08 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173351789 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +401,65 @@ private[kafka010] object

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-08 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173341089 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +401,65 @@ private[kafka010] object

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-08 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173340037 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +401,65 @@ private[kafka010] object

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-08 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20767#discussion_r173338056 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -342,80 +401,65 @@ private[kafka010] object

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-08 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20767 jenkins retest this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20765: [SPARK-23406][SS] Enable stream-stream self-joins...

2018-03-08 Thread tdas
Github user tdas closed the pull request at: https://github.com/apache/spark/pull/20765 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-23406][SS] Enable stream-stream self-joins for branch-2.3

2018-03-07 Thread tdas
Relation [value#66] +- Project [(value#12 % 5) AS key#9, value#12] +- Project [value#66 AS value#12]// solution: project with aliases +- LocalRelation [value#66] ``` ## How was this patch tested? New unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #207

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-07 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20767 @zsxwing @brkyvz PTAL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #20767: Fixed

2018-03-07 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20767 Fixed ## What changes were proposed in this pull request? CacheKafkaConsumer in the project `kafka-0-10-sql` is designed to maintain a pool of KafkaConsumers that can be reused. However

[GitHub] spark pull request #20765: [SPARK-23406][SS] Enable stream-stream self-joins...

2018-03-07 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20765 [SPARK-23406][SS] Enable stream-stream self-joins for branch-2.3 This is a backport of #20598. ## What changes were proposed in this pull request? Solved two bugs to enable stream

[GitHub] spark pull request #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-03-06 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20688#discussion_r172730994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousRateStreamSource.scala --- @@ -24,8 +24,8 @@ import

[GitHub] spark pull request #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-03-06 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20688#discussion_r172730333 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/RateSourceSuite.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed

[GitHub] spark pull request #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-03-06 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20688#discussion_r172729894 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateSourceProvider.scala --- @@ -0,0 +1,291 @@ +/* + * Licensed

[GitHub] spark pull request #20755: [SPARK-23406][SS] Enable stream-stream self-joins...

2018-03-06 Thread tdas
Github user tdas closed the pull request at: https://github.com/apache/spark/pull/20755 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20755: [SPARK-23406][SS] Enable stream-stream self-joins...

2018-03-06 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20755 [SPARK-23406][SS] Enable stream-stream self-joins for branch-2.3 ## What changes were proposed in this pull request? This is limited but safe-to-backport version of self-join-fix made in #20598

[GitHub] spark issue #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20710 @rdblue @jose-torres arrgh... i didnt notice that you guys were still commenting before i merged it. feel free to continue discussion and if any change is needed we will deal

spark git commit: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master ba622f45c -> b0f422c38 [SPARK-23559][SS] Add epoch ID to DataWriterFactory. ## What changes were proposed in this pull request? Add an epoch ID argument to DataWriterFactory for use in streaming. As a side effect of passing in this

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-04 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r172081660 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -31,13 +31,17 @@ * the {@link #write(Object)}, {@link

[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-03-02 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20698 Thank you. Merging to master only as this is a new feature touching production code paths. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993983 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/streaming/StreamWriter.java --- @@ -39,21 +36,21 @@ * If this method fails

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993716 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -48,6 +48,9 @@ * same task

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993622 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -48,6 +48,9 @@ * same task

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993559 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -36,8 +36,9 @@ * {@link DataSourceWriter#commit

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993519 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -36,8 +36,9 @@ * {@link DataSourceWriter#commit

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993467 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -36,8 +36,9 @@ * {@link DataSourceWriter#commit

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -172,17 +173,19 @@ object

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993101 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -198,7 +201,7 @@ object DataWritingSparkTask

[GitHub] spark pull request #20710: [SPARK-23559][SS] Add epoch ID to DataWriterFacto...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r171993066 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -48,6 +48,9 @@ * same task

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20698#discussion_r171989658 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala --- @@ -0,0 +1,106

spark git commit: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-03-02 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3a4d15e5d -> 707e6506d [SPARK-23097][SQL][SS] Migrate text socket source to V2 ## What changes were proposed in this pull request? This PR moves structured streaming text socket source to V2. Questions: do we need to remove old "socket"

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-03-02 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20382 LGTM. Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20382: [SPARK-23097][SQL][SS] Migrate text socket source...

2018-03-02 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20382#discussion_r171954250 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -0,0 +1,300 @@ +/* + * Licensed

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20698#discussion_r171750758 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala --- @@ -0,0 +1,105

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20698#discussion_r171750580 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala --- @@ -0,0 +1,105

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20698#discussion_r171750437 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala --- @@ -0,0 +1,105

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20698#discussion_r171741765 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala --- @@ -0,0 +1,105

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-03-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20698#discussion_r171732015 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala --- @@ -199,10 +179,10 @@ private[kafka010] class

[GitHub] spark issue #20703: [SPARK-19185][SS] Make Kafka consumer cache configurable

2018-03-01 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20703 I completely agree with @zsxwing, let understand what the issue is rather than covering it up with a workaround. We should not run into such issue at all

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-03-01 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20382 relevant test failed. please make sure that there is no flakiness in the tests. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20382: [SPARK-23097][SQL][SS] Migrate text socket source...

2018-03-01 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20382#discussion_r171510614 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -0,0 +1,300 @@ +/* + * Licensed

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-28 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20382 @jerryshao please address the above comment, then we are good to merge! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20698: [SPARK-23541][SS] Allow Kafka source to read data with g...

2018-02-28 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20698 @zsxwing @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-02-28 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20698#discussion_r171441133 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala --- @@ -0,0 +1,105

[GitHub] spark pull request #20698: [SPARK-23541][SS] Allow Kafka source to read data...

2018-02-28 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20698 [SPARK-23541][SS] Allow Kafka source to read data with greater parallelism than the number of topic-partitions ## What changes were proposed in this pull request? Currently, when the Kafka

[GitHub] spark pull request #20382: [SPARK-23097][SQL][SS] Migrate text socket source...

2018-02-28 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20382#discussion_r171226477 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/socket.scala --- @@ -61,13 +68,13 @@ class TextSocketSource(host: String

[GitHub] spark pull request #20382: [SPARK-23097][SQL][SS] Migrate text socket source...

2018-02-28 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20382#discussion_r171225853 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -0,0 +1,300 @@ +/* + * Licensed

<    1   2   3   4   5   6   7   8   9   10   >