[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/21651 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/21651 I am uncertain about some of the ways we should transfer the data stored in OffsetSeqs to external storage (e.g. like KafkaSink which I mentioned before). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/21651 cc @koeninger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21651: [SPARK-18258] Sink need access to offset represen...
GitHub user ConcurrencyPractitioner opened a pull request: https://github.com/apache/spark/pull/21651 [SPARK-18258] Sink need access to offset representation ## What changes were proposed in this pull request? Currently, sinks only have access to the batchId and the data, not the actual offset representation. The goal of this PR is to expose this representation to sinks via ```addBatch```. ## How was this patch tested? Existing unit tests (needs to be changed to also test for offsetSeqs) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ConcurrencyPractitioner/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21651.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21651 commit f27be2d323fb287876032b59dd078fc65e9b180d Author: Richard Yu Date: 2018-06-28T00:44:34Z [SPARK-18258] Init Commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21124: [SPARK-23004][SS] Ensure StateStore.commit is called onl...
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/21124 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/20683 @jerryshao In Spark Streaming, I think ```.tmp``` is used as a suffix to indicate that the object was a file, although I do not know if this is universal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...
Github user ConcurrencyPractitioner commented on the issue: https://github.com/apache/spark/pull/20683 Jenkins test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20683: [SPARK-8605] Exclude files in StreamingContext. t...
GitHub user ConcurrencyPractitioner opened a pull request: https://github.com/apache/spark/pull/20683 [SPARK-8605] Exclude files in StreamingContext. textFileStream(direct⦠â¦ory) ## What changes were proposed in this pull request? In this PR, a extra boolean expression was added to test if a regex was present. If returned true, then we exclude the file. ## How was this patch tested? No tests were added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ConcurrencyPractitioner/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20683.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20683 commit 4ad0fe1dc1513a5feba963f902742fecc714e4d0 Author: Richard Yu Date: 2018-02-27T04:05:37Z [SPARK-8605] Exclude files in StreamingContext. textFileStream(directory) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org