[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-28 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/21651
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-27 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/21651
  
I am uncertain about some of the ways we should transfer the data stored in 
OffsetSeqs to external storage (e.g. like KafkaSink which I mentioned before).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-27 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/21651
  
cc @koeninger 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21651: [SPARK-18258] Sink need access to offset represen...

2018-06-27 Thread ConcurrencyPractitioner
GitHub user ConcurrencyPractitioner opened a pull request:

https://github.com/apache/spark/pull/21651

[SPARK-18258] Sink need access to offset representation

## What changes were proposed in this pull request?

Currently, sinks only have access to the batchId and the data, not the 
actual offset representation.
The goal of this PR is to expose this representation to sinks via 
```addBatch```. 

## How was this patch tested?

Existing unit tests (needs to be changed to also test for offsetSeqs)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ConcurrencyPractitioner/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21651.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21651


commit f27be2d323fb287876032b59dd078fc65e9b180d
Author: Richard Yu 
Date:   2018-06-28T00:44:34Z

[SPARK-18258] Init Commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21124: [SPARK-23004][SS] Ensure StateStore.commit is called onl...

2018-04-22 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/21124
  
+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-03-11 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/20683
  
@jerryshao  In Spark Streaming, I think ```.tmp``` is used as a suffix to 
indicate that the object was a file, although I do not know if this is 
universal.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-02-26 Thread ConcurrencyPractitioner
Github user ConcurrencyPractitioner commented on the issue:

https://github.com/apache/spark/pull/20683
  
Jenkins test this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20683: [SPARK-8605] Exclude files in StreamingContext. t...

2018-02-26 Thread ConcurrencyPractitioner
GitHub user ConcurrencyPractitioner opened a pull request:

https://github.com/apache/spark/pull/20683

[SPARK-8605] Exclude files in StreamingContext. textFileStream(direct…

…ory)

## What changes were proposed in this pull request?

In this PR, a extra boolean expression was added to test if a regex was 
present. If returned true, then we exclude the file.

## How was this patch tested?

No tests were added. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ConcurrencyPractitioner/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20683


commit 4ad0fe1dc1513a5feba963f902742fecc714e4d0
Author: Richard Yu <richardyu@...>
Date:   2018-02-27T04:05:37Z

[SPARK-8605] Exclude files in StreamingContext. textFileStream(directory)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org