[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-08-01 Thread KevinZwx
Github user KevinZwx commented on the issue:

https://github.com/apache/spark/pull/16970
  
I'm a little confused with the behavior of dropDuplicates with watermark.  
According to my understanding of the guide documentation, if I have the 
following code, I expect to deduplicate still with uuid but use timestamp 
column and watermark to expire state. 

`.withWatermark("timestamp", "1 day")
.dropDuplicates("uuid", "timestamp")`

But in fact I found that the program probably uses uuid and timestamp as a 
combined key to deduplicate elements because the result count is much larger 
than using dropDuplicates("uuid") and more close to the result with no 
duplication.  Is it the expected behavior?If so how to achieve what I want?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-03-02 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16970
  
@zsxwing Thanks, I am missing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-03-02 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/16970
  
@uncleGen I think `requiredChildDistribution = 
ClusteredDistribution(keyExpressions) :: Nil` (please see 
[here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L344-L345))
 takes care of it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-28 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16970
  
One question: witout aggregation, how to drop duplication between 
partitions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73297/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73297 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73297/testReport)**
 for PR 16970 at commit 
[`d0b7b77`](https://github.com/apache/spark/commit/d0b7b77e345b275d58ba5582f6acde86a80cb3da).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73297/testReport)**
 for PR 16970 at commit 
[`d0b7b77`](https://github.com/apache/spark/commit/d0b7b77e345b275d58ba5582f6acde86a80cb3da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73285/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73285/testReport)**
 for PR 16970 at commit 
[`7a7c0c7`](https://github.com/apache/spark/commit/7a7c0c781c236f8421304ab17403f7347eededcb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Deduplicate(`
  * `case class StreamingDeduplicateExec(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73285/testReport)**
 for PR 16970 at commit 
[`7a7c0c7`](https://github.com/apache/spark/commit/7a7c0c781c236f8421304ab17403f7347eededcb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16970
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73265/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73265 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73265/testReport)**
 for PR 16970 at commit 
[`7a7c0c7`](https://github.com/apache/spark/commit/7a7c0c781c236f8421304ab17403f7347eededcb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73247/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73247 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73247/testReport)**
 for PR 16970 at commit 
[`78dfdfe`](https://github.com/apache/spark/commit/78dfdfe20b6c7f788e5d289ecc63c325679ccd44).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Deduplication(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16970
  
@tdas I created https://issues.apache.org/jira/browse/SPARK-19690 to track 
the issue when joining a batch DataFrame with a streaming DataFrame. I will fix 
it in a separate PR to unblock this one as it touches many files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73247 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73247/testReport)**
 for PR 16970 at commit 
[`78dfdfe`](https://github.com/apache/spark/commit/78dfdfe20b6c7f788e5d289ecc63c325679ccd44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73236/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73236/testReport)**
 for PR 16970 at commit 
[`b2e9cb0`](https://github.com/apache/spark/commit/b2e9cb03f9f5dd9467fdebfd7a6f69639ae36f2b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73236/testReport)**
 for PR 16970 at commit 
[`b2e9cb0`](https://github.com/apache/spark/commit/b2e9cb03f9f5dd9467fdebfd7a6f69639ae36f2b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73225/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73225 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73225/testReport)**
 for PR 16970 at commit 
[`0e72217`](https://github.com/apache/spark/commit/0e7221718ea825f70de594c68081db75b5f841ea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73225 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73225/testReport)**
 for PR 16970 at commit 
[`0e72217`](https://github.com/apache/spark/commit/0e7221718ea825f70de594c68081db75b5f841ea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/16970
  
overall looks good. just a bunch of nits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73076/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73076/testReport)**
 for PR 16970 at commit 
[`ba58e2a`](https://github.com/apache/spark/commit/ba58e2a6315260abe16bdb09bc81efa20afffab2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StreamingDeduplicationExec(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73076 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73076/testReport)**
 for PR 16970 at commit 
[`ba58e2a`](https://github.com/apache/spark/commit/ba58e2a6315260abe16bdb09bc81efa20afffab2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73064/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73064 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73064/testReport)**
 for PR 16970 at commit 
[`5a6af8b`](https://github.com/apache/spark/commit/5a6af8b5fb452f6878c6446f074a049c79c95623).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread brkyvz
Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/16970
  
aw man. I should always refresh before starting a review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16970
  
@brkyvz looks like you were looking at my old changes. I pushed a new 
commit and updated the PR description to reflect the latest supported queries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73064 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73064/testReport)**
 for PR 16970 at commit 
[`5a6af8b`](https://github.com/apache/spark/commit/5a6af8b5fb452f6878c6446f074a049c79c95623).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73028/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16970
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73028 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73028/testReport)**
 for PR 16970 at commit 
[`63a7f4c`](https://github.com/apache/spark/commit/63a7f4c62b2da32351d008f9719d513e14562e56).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Deduplication(`
  * `trait WatermarkSupport extends SparkPlan `
  * `case class DeduplicationExec(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16970
  
**[Test build #73028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73028/testReport)**
 for PR 16970 at commit 
[`63a7f4c`](https://github.com/apache/spark/commit/63a7f4c62b2da32351d008f9719d513e14562e56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org