[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-08-01 Thread KevinZwx
Github user KevinZwx commented on the issue: https://github.com/apache/spark/pull/16970 I'm a little confused with the behavior of dropDuplicates with watermark. According to my understanding of the guide documentation, if I have the following code, I expect to deduplicate still

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-03-02 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16970 @zsxwing Thanks, I am missing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-03-02 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16970 @uncleGen I think `requiredChildDistribution = ClusteredDistribution(keyExpressions) :: Nil` (please see

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-28 Thread uncleGen
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16970 One question: witout aggregation, how to drop duplication between partitions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73297/ Test PASSed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73297/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73297/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73285/ Test PASSed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73285/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73285/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16970 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73265/ Test FAILed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73265/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73247/ Test PASSed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73247/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16970 @tdas I created https://issues.apache.org/jira/browse/SPARK-19690 to track the issue when joining a batch DataFrame with a streaming DataFrame. I will fix it in a separate PR to unblock this one as

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73247/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73236/ Test FAILed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73236/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73236/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73225/ Test FAILed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73225/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73225/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/16970 overall looks good. just a bunch of nits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73076/ Test FAILed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73076/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73076/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73064/ Test FAILed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73064/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/16970 aw man. I should always refresh before starting a review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16970 @brkyvz looks like you were looking at my old changes. I pushed a new commit and updated the PR description to reflect the latest supported queries. --- If your project is set up for it, you can

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73064/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73028/ Test PASSed. ---

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16970 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73028/testReport)** for PR 16970 at commit

[GitHub] spark issue #16970: [SPARK-19497][SS]Implement streaming deduplication

2017-02-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16970 **[Test build #73028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73028/testReport)** for PR 16970 at commit