[GitHub] spark issue #19435: [MINOR][SS] "keyWithIndexToNumValues" -> "keyWithIndexTo...

2017-10-11 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/19435 @tdas @zsxwing would you take a look, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19435: [WIP][SS][MINOR] "keyWithIndexToNumValues" -> "keyWithIn...

2017-10-05 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/19435 @tdas would you take a look, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #19435: [WIP][SS][MINOR] "keyWithIndexToNumValues" -> "ke...

2017-10-04 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/19435#discussion_r142836474 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala --- @@ -291,7 +291,7 @@ class

[GitHub] spark pull request #19435: [WIP][SS][MINOR] "keyWithIndexToNumValues" -> "ke...

2017-10-04 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/19435 [WIP][SS][MINOR] "keyWithIndexToNumValues" -> "keyWithIndexToValue" ## What changes were proposed in this pull request? This PR changes `keyWithIndexToNumValues`

[GitHub] spark issue #19206: [SPARK-19206][yarn]Client and ApplicationMaster resolveP...

2017-09-14 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/19206 It says `Spark-19206` in your PR title but Spark-19206 is actually about `Update outdated parameter descriptions in external-kafka module`; so maybe you should reference a different JIRA. That's all

[GitHub] spark issue #18342: [Spark-21123][Docs][Structured Streaming] Options for fi...

2017-06-19 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/18342 This lgtm; @zsxwing please also take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-05-03 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 thank you @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-05-03 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 Comments have beed addressed -- @zsxwing it'd be great if you could take another look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

2017-05-02 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17346#discussion_r114468906 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala --- @@ -36,20 +37,27 @@ import

[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

2017-05-02 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17346#discussion_r114468833 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -145,6 +147,41 @@ class FileStreamSinkSuite extends

[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

2017-05-02 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17346#discussion_r114468801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala --- @@ -53,6 +53,29 @@ object FileStreamSink extends

[GitHub] spark pull request #17346: [SPARK-19965][SS] DataFrame batch reader may fail...

2017-05-02 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17346#discussion_r114468821 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala --- @@ -53,6 +53,29 @@ object FileStreamSink extends

[GitHub] spark pull request #17735: [SPARK-20441][SPARK-20432][SS] Within the same st...

2017-05-02 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17735#discussion_r114453379 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -120,6 +141,32 @@ class StreamSuite extends StreamTest

[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...

2017-05-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17735 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-05-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 @zsxwing would you take a look at your convenience? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-05-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...

2017-05-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17735 @zsxwing @brkyvz would you take a look at this? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...

2017-05-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17735 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...

2017-05-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17735 @zsxwing @brkyvz would you take a look at this? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17812: [WIP][SPARK-][SS] Batch queries with 'Dataset/Dat...

2017-04-30 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/17812 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #17812: [WIP][SPARK-][SS] Batch queries with 'Dataset/Dat...

2017-04-30 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/17812 [WIP][SPARK-][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-04-29 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-04-28 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...

2017-04-27 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17735 @brkyvz please take a another look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17735: [SPARK-20441][SPARK-20432][SS] Within the same streaming...

2017-04-25 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17735 @zsxwing @brkyvz would you take a look at this? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-04-23 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 Rebased to master to resolve conflicts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17735: [WIP][SS] Within the same streaming query, one St...

2017-04-23 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/17735 [WIP][SS] Within the same streaming query, one StreamingRelation should only be transformed to one StreamingExecutionRelation ## What changes were proposed in this pull request? (Please

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-04-06 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-03-28 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17268: [SPARK-19932][SS] Disallow a case that might caus...

2017-03-22 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/17268 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17268: [SPARK-19932][SS] Disallow a case that might cause OOM f...

2017-03-22 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17268 Thanks for the comments! Closing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17346: [SPARK-19965][SS] DataFrame batch reader may fail to inf...

2017-03-19 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17346 @zsxwing would you take a look at this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17346: [WIP] DataFrame batch reader may fail to infer pa...

2017-03-19 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/17346 [WIP] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output WIP of SPARK-19965 ## What changes were proposed in this pull request? (Please fill

[GitHub] spark pull request #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-19 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17219#discussion_r106800936 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BatchCommitLog.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #17327: [SPARK-19721][SS][BRANCH-2.1] Good error message for ver...

2017-03-17 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17327 Thanks! Closed this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17327: [SPARK-19721][SS][BRANCH-2.1] Good error message ...

2017-03-17 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/17327 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17070: [SPARK-19721][SS] Good error message for version mismatc...

2017-03-16 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17070 @zsxwing sure, please see https://github.com/apache/spark/pull/17327 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17327: [SPARK-19721][SS][BRANCH-2.1] Good error message ...

2017-03-16 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/17327 [SPARK-19721][SS][BRANCH-2.1] Good error message for version mismatch in log files ## Problem There are several places where we write out version identifiers in various logs

[GitHub] spark issue #17268: [SPARK-19932][SS] Disallow a case that might case OOM fo...

2017-03-16 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17268 Sure; sorry I didn't say it out but I meant the same thing :-) @marmbrus now that I've updated this as well as the JIRA, would you mind taking another look? Thanks! --- If your project

[GitHub] spark pull request #17070: [SPARK-19721][SS] Good error message for version ...

2017-03-16 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17070#discussion_r106352628 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -195,6 +195,11 @@ class HDFSMetadataLog[T <: Any

[GitHub] spark issue #17268: [SPARK-19932][SS] Also save event time into StateStore f...

2017-03-14 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17268 Thank you @marmbrus for the detailed explanation! > For that reason, I think its safest to require the user to explicitly include the timestamp. Yea, let me upd

[GitHub] spark issue #17299: [SPARK-19817][SS] Make it clear that `timeZone` is a gen...

2017-03-14 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17299 This is the fix for the streaming counter-part (i.e. Structured Streaming), @ueshin @gatorsmile would you take a look? Thanks! --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #17299: [SPARK-19817][SS] Make it clear that `timeZone` i...

2017-03-14 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/17299 [SPARK-19817][SS] Make it clear that `timeZone` is a general option in DataStreamReader/Writer ## What changes were proposed in this pull request? As timezone setting can also affect

[GitHub] spark issue #17268: [SPARK-19932][SS] Also save event time into StateStore f...

2017-03-13 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17268 @marmbrus thanks for the comments. > In the worst case... it is possible that the result actually ends up with duplicates in it. Ah, could you elaborate? I'm not sure

[GitHub] spark issue #17268: [SPARK-19932][SS] Also save event time into StateStore f...

2017-03-13 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17268 @marmbrus @zsxwing would you take a look at this, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #17268: [WIP][SPARK][SS] Also save event time into StateS...

2017-03-12 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/17268 [WIP][SPARK][SS] Also save event time into StateStore for certain cases ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How

[GitHub] spark issue #17070: [SPARK-19721][SS] Good error message for version mismatc...

2017-03-09 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17070 @zsxwing would you take a look when you've got a minute? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-09 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17120 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17120#discussion_r105097597 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -79,9 +81,16 @@ class FileStreamSource

[GitHub] spark pull request #17216: [SPARK-19873][SS] Record num shuffle partitions i...

2017-03-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17216#discussion_r105081023 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala --- @@ -71,7 +71,10 @@ object OffsetSeq { * @param

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17120#discussion_r105080626 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -309,6 +315,10 @@ object FileStreamSource

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17120#discussion_r105080572 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -75,7 +77,7 @@ class FileStreamSource

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-06 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17120 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17070: [SPARK-19721][SS] Good error message for version mismatc...

2017-03-06 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17070 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-04 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17120#discussion_r104287612 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -1253,8 +1253,26 @@ class FileStreamSourceSuite extends

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-03 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17120 Thank you @marmbrus @steveloughran for the feedback. Added some explicit docs. Here's a screenshot of the affected section from the programming guide: ![snip20170304_5](https

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-03 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17120#discussion_r104276335 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -1052,10 +1052,18 @@ Here are the details of all the sinks in Spark. Append

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-03 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17120 Thank you @marmbrus @steveloughran for the feedback. I've added some explicit docs. Here's a screenshot of the affected section from the programming guide: ![snip20170304_4](https

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-03 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17120#discussion_r104276118 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -1052,10 +1052,18 @@ Here are the details of all the sinks in Spark. Append

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-03-02 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16970 @uncleGen I think `requiredChildDistribution = ClusteredDistribution(keyExpressions) :: Nil` (please see [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark

[GitHub] spark issue #17070: [SPARK-19721][SS] Good error message for version mismatc...

2017-03-02 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17070 @zsxwing would you take a look when you've got a minute? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17120: [SPARK-19715][Structured Streaming] Option to Strip Path...

2017-03-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17120 @steveloughran thanks for the comments. @marmbrus @zsxwing it'd be great if you could share some thoughts! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #17120: [SPARK-19715][Structured Streaming] Option to Str...

2017-03-01 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/17120 [SPARK-19715][Structured Streaming] Option to Strip Paths in FileSource ## What changes were proposed in this pull request? Today, we compare the whole path when deciding if a file is new

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103604050 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -190,32 +190,31 @@ class FileStreamSource

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603945 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +665,154 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603935 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +665,154 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603942 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +665,154 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603922 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +665,154 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603898 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +665,154 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603705 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +665,154 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603671 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala --- @@ -208,6 +208,11 @@ trait StreamTest extends QueryTest

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603693 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +665,154 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103603687 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +665,154 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103405331 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -52,10 +52,7 @@ abstract class FileStreamSourceTest

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103404962 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -159,28 +161,64 @@ class FileStreamSource

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103404486 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -159,28 +161,64 @@ class FileStreamSource

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-28 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103403986 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -159,28 +161,64 @@ class FileStreamSource

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-27 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103366158 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -158,12 +158,28 @@ class FileStreamSource

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-27 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103366082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -158,12 +158,28 @@ class FileStreamSource

[GitHub] spark issue #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-26 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16987 Rebased to master and tests updated. @zsxwing would you take another look when you've got a minute? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103104036 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +663,101 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103104024 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -662,6 +663,101 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r103104003 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -243,13 +243,20 @@ case class DataSource

[GitHub] spark issue #17070: [WIP][SS] Good error message for version mismatch in log...

2017-02-26 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/17070 @srowen thank you for the comments! I was trying to tackle SPARK-19721, sorry the summary just said "WIP" without a JIRA number; adding JIRA number back. --- If your project is set up f

[GitHub] spark pull request #17070: [WIP][SS] Good error message for version mismatch...

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17070#discussion_r103102481 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -195,6 +196,11 @@ class HDFSMetadataLog[T <: Any

[GitHub] spark pull request #17070: [WIP][SS] Good error message for version mismatch...

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17070#discussion_r103102422 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -268,6 +274,37 @@ class HDFSMetadataLog[T <: Any

[GitHub] spark pull request #17070: [WIP][SS] Good error message for version mismatch...

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17070#discussion_r103102416 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -18,6 +18,7 @@ package

[GitHub] spark pull request #17070: [WIP][SS] Good error message for version mismatch...

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17070#discussion_r103102420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -268,6 +274,37 @@ class HDFSMetadataLog[T <: Any

[GitHub] spark pull request #17070: [WIP][SS] Good error message for version mismatch...

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17070#discussion_r103102394 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -100,7 +100,8 @@ private[kafka010] class

[GitHub] spark pull request #17070: [WIP][SS] Good error message for version mismatch...

2017-02-26 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/17070#discussion_r103102389 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala --- @@ -226,7 +226,15 @@ class KafkaSourceSuite

[GitHub] spark pull request #17070: [WIP][SS] Good error message for version mismatch...

2017-02-25 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/17070 [WIP][SS] Good error message for version mismatch in log files ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How

[GitHub] spark issue #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-23 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16987 Reopening :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-23 Thread lw-lin
GitHub user lw-lin reopened a pull request: https://github.com/apache/spark/pull/16987 [SPARK-19633][SS] FileSource read from FileSink ## What changes were proposed in this pull request? Right now file source always uses `InMemoryFileIndex` to scan files from a given path

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-22 Thread lw-lin
Github user lw-lin closed the pull request at: https://github.com/apache/spark/pull/16987 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-22 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16987 Using deterministic file names sounds great. Thanks! I'm closing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-20 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16987 @marmbrus @zsxwing would you take a look at this? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16987: [SPARK-19633][SS] FileSource read from FileSink

2017-02-19 Thread lw-lin
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16987#discussion_r101917807 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -76,12 +76,13 @@ abstract class FileStreamSourceTest

[GitHub] spark issue #16987: [WIP][SPARK-][SS] FileSource read from FileSink

2017-02-19 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16987 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16987: [WIP][SPARK-][SS] FileSource read from FileSink

2017-02-18 Thread lw-lin
GitHub user lw-lin opened a pull request: https://github.com/apache/spark/pull/16987 [WIP][SPARK-][SS] FileSource read from FileSink ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested

[GitHub] spark issue #16912: [SPARK-19576] [Core] Task attempt paths exist in output ...

2017-02-13 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16912 To me this PR aims to also use driver to coordinate Hadoop output committing for `saveAsNewAPIHadoopFile` -- actually the same was added for `saveAsHadoopFile` back in https://github.com/apache

  1   2   3   4   5   6   >