Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/19435
@tdas @zsxwing would you take a look, thanks
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/19435
@tdas would you take a look, thanks
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19435#discussion_r142836474
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
---
@@ -291,7 +291,7 @@ class
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/19435
[WIP][SS][MINOR] "keyWithIndexToNumValues" -> "keyWithIndexToValue"
## What changes were proposed in this pull request?
This PR changes `keyWithIndexToNumValues`
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/19206
It says `Spark-19206` in your PR title but Spark-19206 is actually about
`Update outdated parameter descriptions in external-kafka module`; so maybe you
should reference a different JIRA. That's all
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/18342
This lgtm; @zsxwing please also take a look
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
thank you @zsxwing
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
Comments have beed addressed -- @zsxwing it'd be great if you could take
another look
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17346#discussion_r114468906
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
---
@@ -36,20 +37,27 @@ import
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17346#discussion_r114468833
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
---
@@ -145,6 +147,41 @@ class FileStreamSinkSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17346#discussion_r114468801
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
---
@@ -53,6 +53,29 @@ object FileStreamSink extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17346#discussion_r114468821
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
---
@@ -53,6 +53,29 @@ object FileStreamSink extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17735#discussion_r114453379
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -120,6 +141,32 @@ class StreamSuite extends StreamTest
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17735
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
@zsxwing would you take a look at your convenience? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17735
@zsxwing @brkyvz would you take a look at this? thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17735
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17735
@zsxwing @brkyvz would you take a look at this? thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user lw-lin closed the pull request at:
https://github.com/apache/spark/pull/17812
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/17812
[WIP][SPARK-][SS] Batch queries with 'Dataset/DataFrame.withWatermark()`
does not execute
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17735
@brkyvz please take a another look
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17735
@zsxwing @brkyvz would you take a look at this? thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
Rebased to master to resolve conflicts
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/17735
[WIP][SS] Within the same streaming query, one StreamingRelation should
only be transformed to one StreamingExecutionRelation
## What changes were proposed in this pull request?
(Please
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin closed the pull request at:
https://github.com/apache/spark/pull/17268
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17268
Thanks for the comments! Closing this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17346
@zsxwing would you take a look at this? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/17346
[WIP] DataFrame batch reader may fail to infer partitions when reading
FileStreamSink's output
WIP of SPARK-19965
## What changes were proposed in this pull request?
(Please fill
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17219#discussion_r106800936
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/BatchCommitLog.scala
---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17327
Thanks! Closed this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user lw-lin closed the pull request at:
https://github.com/apache/spark/pull/17327
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17070
@zsxwing sure, please see https://github.com/apache/spark/pull/17327
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/17327
[SPARK-19721][SS][BRANCH-2.1] Good error message for version mismatch in
log files
## Problem
There are several places where we write out version identifiers in various
logs
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17268
Sure; sorry I didn't say it out but I meant the same thing :-)
@marmbrus now that I've updated this as well as the JIRA, would you mind
taking another look? Thanks!
---
If your project
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17070#discussion_r106352628
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
---
@@ -195,6 +195,11 @@ class HDFSMetadataLog[T <: Any
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17268
Thank you @marmbrus for the detailed explanation!
> For that reason, I think its safest to require the user to explicitly
include the timestamp.
Yea, let me upd
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17299
This is the fix for the streaming counter-part (i.e. Structured Streaming),
@ueshin @gatorsmile would you take a look? Thanks!
---
If your project is set up for it, you can reply to this email
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/17299
[SPARK-19817][SS] Make it clear that `timeZone` is a general option in
DataStreamReader/Writer
## What changes were proposed in this pull request?
As timezone setting can also affect
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17268
@marmbrus thanks for the comments.
> In the worst case... it is possible that the result actually ends up with
duplicates in it.
Ah, could you elaborate? I'm not sure
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17268
@marmbrus @zsxwing would you take a look at this, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/17268
[WIP][SPARK][SS] Also save event time into StateStore for certain cases
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17070
@zsxwing would you take a look when you've got a minute? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17120
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17120#discussion_r105097597
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -79,9 +81,16 @@ class FileStreamSource
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17216#discussion_r105081023
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala
---
@@ -71,7 +71,10 @@ object OffsetSeq {
* @param
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17120#discussion_r105080626
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -309,6 +315,10 @@ object FileStreamSource
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17120#discussion_r105080572
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -75,7 +77,7 @@ class FileStreamSource
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17120
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17070
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17120#discussion_r104287612
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -1253,8 +1253,26 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17120
Thank you @marmbrus @steveloughran for the feedback. Added some explicit
docs. Here's a screenshot of the affected section from the programming guide:
![snip20170304_5](https
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17120#discussion_r104276335
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -1052,10 +1052,18 @@ Here are the details of all the sinks in Spark.
Append
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17120
Thank you @marmbrus @steveloughran for the feedback. I've added some
explicit docs. Here's a screenshot of the affected section from the programming
guide:
![snip20170304_4](https
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17120#discussion_r104276118
--- Diff: docs/structured-streaming-programming-guide.md ---
@@ -1052,10 +1052,18 @@ Here are the details of all the sinks in Spark.
Append
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/16970
@uncleGen I think `requiredChildDistribution =
ClusteredDistribution(keyExpressions) :: Nil` (please see
[here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17070
@zsxwing would you take a look when you've got a minute? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17120
@steveloughran thanks for the comments.
@marmbrus @zsxwing it'd be great if you could share some thoughts!
---
If your project is set up for it, you can reply to this email and have your
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/17120
[SPARK-19715][Structured Streaming] Option to Strip Paths in FileSource
## What changes were proposed in this pull request?
Today, we compare the whole path when deciding if a file is new
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103604050
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -190,32 +190,31 @@ class FileStreamSource
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603945
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +665,154 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603935
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +665,154 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603942
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +665,154 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603922
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +665,154 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603898
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +665,154 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603705
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +665,154 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603671
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ---
@@ -208,6 +208,11 @@ trait StreamTest extends QueryTest
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603693
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +665,154 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103603687
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +665,154 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103405331
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -52,10 +52,7 @@ abstract class FileStreamSourceTest
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103404962
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -159,28 +161,64 @@ class FileStreamSource
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103404486
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -159,28 +161,64 @@ class FileStreamSource
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103403986
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -159,28 +161,64 @@ class FileStreamSource
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103366158
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -158,12 +158,28 @@ class FileStreamSource
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103366082
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
---
@@ -158,12 +158,28 @@ class FileStreamSource
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/16987
Rebased to master and tests updated. @zsxwing would you take another look
when you've got a minute?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103104036
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +663,101 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103104024
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -662,6 +663,101 @@ class FileStreamSourceSuite extends
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r103104003
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -243,13 +243,20 @@ case class DataSource
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/17070
@srowen thank you for the comments! I was trying to tackle SPARK-19721,
sorry the summary just said "WIP" without a JIRA number; adding JIRA number
back.
---
If your project is set up f
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17070#discussion_r103102481
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
---
@@ -195,6 +196,11 @@ class HDFSMetadataLog[T <: Any
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17070#discussion_r103102422
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
---
@@ -268,6 +274,37 @@ class HDFSMetadataLog[T <: Any
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17070#discussion_r103102416
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
---
@@ -18,6 +18,7 @@
package
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17070#discussion_r103102420
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
---
@@ -268,6 +274,37 @@ class HDFSMetadataLog[T <: Any
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17070#discussion_r103102394
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -100,7 +100,8 @@ private[kafka010] class
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17070#discussion_r103102389
--- Diff:
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala
---
@@ -226,7 +226,15 @@ class KafkaSourceSuite
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/17070
[WIP][SS] Good error message for version mismatch in log files
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/16987
Reopening :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user lw-lin reopened a pull request:
https://github.com/apache/spark/pull/16987
[SPARK-19633][SS] FileSource read from FileSink
## What changes were proposed in this pull request?
Right now file source always uses `InMemoryFileIndex` to scan files from a
given path
Github user lw-lin closed the pull request at:
https://github.com/apache/spark/pull/16987
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/16987
Using deterministic file names sounds great. Thanks! I'm closing this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/16987
@marmbrus @zsxwing would you take a look at this? thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user lw-lin commented on a diff in the pull request:
https://github.com/apache/spark/pull/16987#discussion_r101917807
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
---
@@ -76,12 +76,13 @@ abstract class FileStreamSourceTest
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/16987
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user lw-lin opened a pull request:
https://github.com/apache/spark/pull/16987
[WIP][SPARK-][SS] FileSource read from FileSink
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested
Github user lw-lin commented on the issue:
https://github.com/apache/spark/pull/16912
To me this PR aims to also use driver to coordinate Hadoop output
committing for `saveAsNewAPIHadoopFile` -- actually the same was added for
`saveAsHadoopFile` back in https://github.com/apache
1 - 100 of 597 matches
Mail list logo