[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15102
  
> Merged this to master, @zsxwing do you have another PR for 2.0.

See #15367


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged this to master, @zsxwing do you have another PR for 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66398/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66398/consoleFull)**
 for PR 15102 at commit 
[`4754125`](https://github.com/apache/spark/commit/4754125d041ebdf2286bb015770d58351779622d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66397/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66397/consoleFull)**
 for PR 15102 at commit 
[`7d658f1`](https://github.com/apache/spark/commit/7d658f1004375bbc49c60ac5091b56da34f04da2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66396/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66396/consoleFull)**
 for PR 15102 at commit 
[`d9d848c`](https://github.com/apache/spark/commit/d9d848ca455d659e39ceadf8c2b8d50867b85962).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66398/consoleFull)**
 for PR 15102 at commit 
[`4754125`](https://github.com/apache/spark/commit/4754125d041ebdf2286bb015770d58351779622d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66397/consoleFull)**
 for PR 15102 at commit 
[`7d658f1`](https://github.com/apache/spark/commit/7d658f1004375bbc49c60ac5091b56da34f04da2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66396/consoleFull)**
 for PR 15102 at commit 
[`d9d848c`](https://github.com/apache/spark/commit/d9d848ca455d659e39ceadf8c2b8d50867b85962).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66345/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66345 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66345/consoleFull)**
 for PR 15102 at commit 
[`4316906`](https://github.com/apache/spark/commit/4316906ab19556d297ded2bc38af85bb81a2e91b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66338/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66338 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66338/consoleFull)**
 for PR 15102 at commit 
[`d50a05e`](https://github.com/apache/spark/commit/d50a05eb703acb841b85b04e2b52e958778e6ab1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66338 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66338/consoleFull)**
 for PR 15102 at commit 
[`d50a05e`](https://github.com/apache/spark/commit/d50a05eb703acb841b85b04e2b52e958778e6ab1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #3294 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3294/consoleFull)**
 for PR 15102 at commit 
[`a6c4970`](https://github.com/apache/spark/commit/a6c4970ace1df46e2d65c2cc8a606f3736454d35).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #3294 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3294/consoleFull)**
 for PR 15102 at commit 
[`a6c4970`](https://github.com/apache/spark/commit/a6c4970ace1df46e2d65c2cc8a606f3736454d35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66289/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66289 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66289/consoleFull)**
 for PR 15102 at commit 
[`a6c4970`](https://github.com/apache/spark/commit/a6c4970ace1df46e2d65c2cc8a606f3736454d35).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66288/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66288/consoleFull)**
 for PR 15102 at commit 
[`7ff1059`](https://github.com/apache/spark/commit/7ff10599fdadcbdd2515b3216d35307e906de184).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66285/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66285/consoleFull)**
 for PR 15102 at commit 
[`ccadd81`](https://github.com/apache/spark/commit/ccadd81d08f56e45bbe5970656960578ee291bb5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66289 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66289/consoleFull)**
 for PR 15102 at commit 
[`a6c4970`](https://github.com/apache/spark/commit/a6c4970ace1df46e2d65c2cc8a606f3736454d35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66283/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66283 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66283/consoleFull)**
 for PR 15102 at commit 
[`77208d1`](https://github.com/apache/spark/commit/77208d1611810f5c6afb5ba63911cadb1794c863).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66288 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66288/consoleFull)**
 for PR 15102 at commit 
[`7ff1059`](https://github.com/apache/spark/commit/7ff10599fdadcbdd2515b3216d35307e906de184).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66287 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66287/consoleFull)**
 for PR 15102 at commit 
[`f78c990`](https://github.com/apache/spark/commit/f78c9901a1edaa5a1d4a969195671626a63e1ec6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66287/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66287 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66287/consoleFull)**
 for PR 15102 at commit 
[`f78c990`](https://github.com/apache/spark/commit/f78c9901a1edaa5a1d4a969195671626a63e1ec6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66285/consoleFull)**
 for PR 15102 at commit 
[`ccadd81`](https://github.com/apache/spark/commit/ccadd81d08f56e45bbe5970656960578ee291bb5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66283 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66283/consoleFull)**
 for PR 15102 at commit 
[`77208d1`](https://github.com/apache/spark/commit/77208d1611810f5c6afb5ba63911cadb1794c863).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66274/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66274 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66274/consoleFull)**
 for PR 15102 at commit 
[`d154532`](https://github.com/apache/spark/commit/d1545329673e889e93b0e4859248cebff8b488c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66265/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66265/consoleFull)**
 for PR 15102 at commit 
[`e883062`](https://github.com/apache/spark/commit/e88306267e372101a685492afcfa652408a83109).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-10-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66274 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66274/consoleFull)**
 for PR 15102 at commit 
[`d154532`](https://github.com/apache/spark/commit/d1545329673e889e93b0e4859248cebff8b488c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-29 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  

> It would be nice to be able to do something other than earliest/latest.

That's what Assign and the starting offset arguments to the Subscribe 
strategies are for.  The implementation was already there.

> When specifying earliest, you end up with really big partitions. 

Again, spark.streaming.kafka.maxRatePerPartition and the associated 
implementation was already there.  If you don't want the coupling to time, it's 
pretty straightforward.  The bigger question is when / if / how you're going to 
do backpressure.

>  One question, is it a problem if two tasks are pulling from the same 
topic partition in parallel? Does this break the assumptions of our caching?

This breaks fundamental assumptions of Kafka (per-topicpartition ordering) 
and really shouldn't be done.

> I can do a final pass over the code, but do we think we are getting close 
to something that we can merge and iterate on?

I think we're in much better shape than when we started, but I still 
honestly think this implementation made a bunch of user-visible behavioral and 
configuration changes from the DStream that really have nothing to do with the 
inherent differences between it and structured streaming.  This isn't just me 
whining about "you changed my code",  it really is going to make it harder to 
explain to people and harder to maintain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-29 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15102
  
I spent a while playing around with this today on a real cluster, and 
overall it is pretty cool!  I have a few suggestions we should implement in the 
long run, but these can probably be done in follow-up PRs
 - It would be nice to be able to do something other than earliest/latest.  
In particular, I have some low volume streams where I want to go back a bit as 
I'm working interactively.  Going back all the way results in a really long 
job.  If there was a time index, I'd love to say `-10 minutes` but even just 
`-100 offsets would be great.
 - When specifying `earliest`, you end up with really big partitions.  As a 
result you get virtually no progress reporting until its done.  I'd consider 
having a max number of offsets in a task.  It should be easy to split them up 
in `getBatch`.  One question, is it a problem if two tasks are pulling from the 
same topic partition in parallel?  Does this break the assumptions of our 
caching?
 - Similarly, you get no results until the end.  We might also want to have 
a `maxOffsetsInBatch` similar to what we have in the `FileStreamSource`.

I can do a final pass over the code, but do we think we are getting close 
to something that we can merge and iterate on?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66047/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66047 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66047/consoleFull)**
 for PR 15102 at commit 
[`9d95d52`](https://github.com/apache/spark/commit/9d95d52cd1cb9ef83efaaadc5d4e9a5dc3e1c843).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66047 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66047/consoleFull)**
 for PR 15102 at commit 
[`9d95d52`](https://github.com/apache/spark/commit/9d95d52cd1cb9ef83efaaadc5d4e9a5dc3e1c843).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15102
  
FYI: #15274 adds support for parsing JSON from the key/value into a Spark 
SQL `StructType`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66005/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66005/consoleFull)**
 for PR 15102 at commit 
[`59a93a5`](https://github.com/apache/spark/commit/59a93a561235d4bb0db04b3be6c0325e695df7e2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, 
Long]) extends Offset `
  * `abstract class StreamExecutionThread(name: String) extends 
UninterruptibleThread(name)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #66005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66005/consoleFull)**
 for PR 15102 at commit 
[`59a93a5`](https://github.com/apache/spark/commit/59a93a561235d4bb0db04b3be6c0325e695df7e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15102
  
Just pushed to verify if the workaround for 
https://issues.apache.org/jira/browse/KAFKA-1894 does work on Jenkins. Not 
ready for another round of review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
Ok, so this kind of thing is why I was concerned about the copy, paste, 
randomly change things approach to developing this module.

> (5) Topics are deleted when a Spark job is runinng, which may cause 
OffsetOutOfRangeException. (I'm not sure if there are more types of exceptions, 
may need to investigate) Solution: log a warning. Note: if a Spark job fails, 
then the query will fail as well.

OffsetOutOfRangeException basically means you asked Kafka for an offset, 
and it wasn't there.  The most common reason this happens isn't because a topic 
got deleted, it's because messages expired out of retention before they got 
read.

Just logging at warning level and continuing in this situation is 
catastrophically, someone-loses-their-paying-job-not-their-spark-job, bad.

The existing kafka DStream integrations that have been around for 7 spark 
versions will just let that exception be thrown, resulting in errors / failed 
tasks, which make it pretty obvious that something is really wrong.

If you think that behavior is incorrect, let's figure out a unified 
behavior for how to deal with exceptional situations that break fundamental 
assumptions, and make it realy obvious to users how to get the behavior 
they need across both modules.  But having the structured stream behave in 
significantly different ways seems like a recipe for trouble.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15102
  
We can add an option to allow the user failing the query instead of just 
logging the warning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-27 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15102
  
Right now since there are some arguments about how to handle various 
failures, I'm listing what I found via stress test to discuss:

(1) Kafka APIs fail because we cannot connect to Kafka cluster. Solution: 
fail the query.

(2) `getOffset` fails temporarily because some of topics are deleted at the 
same time. `consumer.position` will throw NPE due to this race condition. 
Solution: retry (This is why `withRetries` is added)

```
consumer.poll(0)
val partitions = consumer.assignment()
consumer.position(p) 
```

(3) In `getBatch`, some partitions are new because they are not in 
`fromOffsets`. Then we will call `fetchNewPartitionEarliestOffsets` to fetch 
these partitions. However, some of these new partitions may be deleted due to 
topic deletion, then they won't appear in `consumer.assignment()`. Solution: 
log a warning.

(4) In `getBatch`, some partitions are new because they are not in 
`fromOffsets`. Then we will call `fetchNewPartitionEarliestOffsets` to fetch 
these partitions. Similiar to (2), `consumer.position` may throw NPE due to 
this race condition. Solution : retry.

(5) Topics are deleted when a Spark job is runinng, which may cause 
`OffsetOutOfRangeException`. (I'm not sure if there are more types of 
exceptions, may need to investigate) Solution: log a warning.

(6) A topic is deleted then added. This may make untilOffset is less than 
fromOffset. Solution: log a warning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
Ok, finished a line-by-line compare + comment.

The biggest thing I'm having trouble reconciling is the stated emphasis on 
limiting user options in order to give guarantees, yet throwing those 
guarantees away with only warn level logging (e.g. the handling of offset out 
of range exception).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65937/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #65937 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65937/consoleFull)**
 for PR 15102 at commit 
[`852f607`](https://github.com/apache/spark/commit/852f607a4253af67d2b425527fa0b87ad20aa953).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #65937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65937/consoleFull)**
 for PR 15102 at commit 
[`852f607`](https://github.com/apache/spark/commit/852f607a4253af67d2b425527fa0b87ad20aa953).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65926/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #65926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65926/consoleFull)**
 for PR 15102 at commit 
[`755ceaa`](https://github.com/apache/spark/commit/755ceaa3531a690f755982a55c71f977c3039bc0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `// ('scaled' = +Infinity). However in the case that this class 
also has`
  * `// 0 probability, the class will not be selected ('scaled' is 
NaN).`
  * `  final val thresholds: DoubleArrayParam = new DoubleArrayParam(this, 
\"thresholds\", \"Thresholds in multi-class classification to adjust the 
probability of predicting each class. Array must have length equal to the 
number of classes, with values > 0 excepting that at most one value may be 0. 
The class with largest value p/t is predicted, where p is the original 
probability of that class and t is the class's threshold\", (t: Array[Double]) 
=> t.forall(_ >= 0) && t.count(_ == 0) <= 1)`
  * `thresholds = Param(Params._dummy(), \"thresholds\", \"Thresholds 
in multi-class classification to adjust the probability of predicting each 
class. Array must have length equal to the number of classes, with values > 0, 
excepting that at most one value may be 0. The class with largest value p/t is 
predicted, where p is the original probability of that class and t is the 
class's threshold.\", typeConverter=TypeConverters.toListFloat)`
  * `case class SortOrder(child: Expression, direction: SortDirection, 
nullOrdering: NullOrdering)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #65926 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65926/consoleFull)**
 for PR 15102 at commit 
[`755ceaa`](https://github.com/apache/spark/commit/755ceaa3531a690f755982a55c71f977c3039bc0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65845/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #65845 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65845/consoleFull)**
 for PR 15102 at commit 
[`5f33eb4`](https://github.com/apache/spark/commit/5f33eb4692e4b95812e70f43d838cb47bd57698b).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SparkContext(config: SparkConf) extends Logging `
  * `class ChiSqSelector @Since(\"2.1.0\") () extends Serializable `
  * `case class ShowColumnsCommand(tableName: TableIdentifier) extends 
RunnableCommand `
  * `abstract class CompactibleFileStreamLog[T: ClassTag](`
  * `class FileStreamSinkLog(`
  * `  case class FileEntry(path: String, timestamp: Timestamp, batchId: 
Long) extends Serializable`
  * `class FileStreamSourceLog(`
  * `trait Offset extends Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #65845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65845/consoleFull)**
 for PR 15102 at commit 
[`5f33eb4`](https://github.com/apache/spark/commit/5f33eb4692e4b95812e70f43d838cb47bd57698b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
Source.getOffset returns an sql Offset, not a kafka offset.  At this point
the current plan is to remove the ordering requirement for sql Offset,
which makes the whole discussion of comparing ordering for kafka offsets
irrelevant.  The order is the order in which the driver thread saw sql
Offsets as it called getOffset.

On Fri, Sep 23, 2016 at 1:24 PM, Jay White Bear 
wrote:

> Which offset does getOffset() return one from the partition or something
> you created? Because it looked like when you were hashing them, you were
> returning partition offsets.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread jwbear
Github user jwbear commented on the issue:

https://github.com/apache/spark/pull/15102
  
Which offset does getOffset() return one from the partition or something 
you created? Because it looked like when you were hashing them, you were 
returning partition offsets.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-23 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
We aren't comparing ordering of offsets across partitions, and I don't
think that was ever in consideration.  At this point the most likely
candidate for the global ordering is implicit in the single driver thread
that is asking for getOffset

On Sep 23, 2016 12:41 AM, "Jay White Bear"  wrote:

> Just curious looking at this, if you are comparing "sequential" offsets
> across partitions a rebalance would definitely affect this and, unless
> something has changed, it probably not a good idea to compare offsets from
> kafka across partitions. You could simply add an id/timestamp to the
> producer and send it with the message rather than using this methodology 
or
> if you must use offset query the broker for the full list and compare what
> you consumed to that list (small increase in latency btwn consumption and
> processing). This is from the Kafka paper, which makes me question your
> scheme: "...Note that our message ids are increasing but not consecutive.
> To compute the id of the next message, we have to add the length of the
> current message to its id." This means simply comparing which offsets are
> larger will not necessarily yield you the most recent message across
> partitions and definitely won't hold in a rebalance during which time some
> broker logs will be on hold and not consumed. In my own implementation, 
the
> offsets are great for message guarantees (eg delivery/consumption checks),
> because the broker has a full ordered list, but not for cross partition
> ordering.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread jwbear
Github user jwbear commented on the issue:

https://github.com/apache/spark/pull/15102
  
Just curious looking at this, if you are comparing "sequential" offsets 
across partitions a rebalance would definitely affect this and, unless 
something has changed, it probably not a good idea to compare offsets from 
kafka across partitions. You could simply add an id/timestamp to the producer 
and send it with the message rather than using this methodology or if you must 
use offset query the broker for the full list and compare what you consumed to 
that list (small increase in latency btwn consumption and processing).  This is 
from the Kafka paper, which makes me question your scheme: "...Note that our 
message ids are increasing but not consecutive. To compute the id of the next 
message, we have to add the length of the current message to its id." This 
means simply comparing which offsets are larger will not necessarily yield you 
the most recent message across partitions and definitely won't hold in a 
rebalance during which time some broker logs will be on hold and not consumed.
  In my own implementation, the offsets are great for message guarantees (eg 
delivery/consumption checks), because the broker has a full ordered list, but 
not for cross partition ordering. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
> I agree that if/when we add that ability to add existing partitions 
midstream we'd probably need to add two offsets in to the SQL offset for new 
partitions.

It's not just existing partitions.  If you have a low-value high-volume 
stream (which is the kind of situation where you'd want auto offset reset 
latest to begin with), you may not even want your first batch to have however 
many messages got in between creation and subscription rebalance.  I dunno, I 
just don't want to assume too much.

> I'd also support JSON here, but I would not mandate it (i.e. try json 
parsing and fall back to comma separation). Its not ambiguous, supports 
consistent usage, and doesn't penalize the simple use cases.

Cool, seems reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15102
  
> "I want to be able to add a topicpartition mid stream, but I don't want 
to start it from the beginning."

I see, I was thinking only of new topics that appear that match your 
pattern.  I agree that if/when we add that ability to add existing partitions 
midstream we'd probably need to add two offsets in to the SQL offset for new 
partitions.

> I think consistency in using json for any non-scalar values is worth 2 
extra characters per topic and 4 at the ends.

I'd also support JSON here, but I would not mandate it (i.e. try json 
parsing and fall back to comma separation).  Its not ambiguous, supports 
consistent usage, and doesn't penalize the simple use cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
@tdas I think as long as marmbrus' PR to remove comparable from the 
interface works for sane variations of subscription changes it's the best way 
to go.  I'm honestly fine with someone getting what they deserve if they delete 
and recreate a topic in the space of a single batch or while a stream is down.

@marmbrus 
> Why do you care when it acquired it? 

This isn't so much a temporal thing, as a let the consumer do its job 
thing.  This sort of configuration should ideally be handled by 
auto.offset.reset, and we shouldn't bake in too much second guessing about it.  
There's plenty of use case for "I want to be able to add a topicpartition mid 
stream, but I don't want to start it from the beginning."

> Are you proposing users have to type

I'm saying that you guys proposed json as a workaround for the 
string->string thing.  Given that, yeah, I think consistency in using json for 
any non-scalar values is worth 2 extra characters per topic and 4 at the ends.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15102
  
Comparable requirement removed in #15207.

> I think in the absence of prior information about the position in a 
topicpartition, you start a new batch on topic B starting from wherever the 
consumer's position was at the time it acquired the subscription, which might 
not be 0. I.e. you call position() before seekToEnd().

Why do you care when it acquired it?  If it appeared in-between the the 
last batch and now, don't you want to consume all of the available data from 
it?  Otherwise the answer is going to depend on the specifics on when you see 
the topic, which seems counter to the model of Structured Streaming.

> I think the main thing that would be confusing is to specify topics in 
one way (custom-delimited string) for one configuration, and in another way 
(structured json) for another configuration.

Are you proposing users have to type `"[\"topic1\", \"topic2\"]` (or pull 
in a json library) instead of `"topic1,topic2"`?  Seems we could pretty 
seamlessly add support for JSON in the future, while still making the common 
case easy to type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/15102
  
@koeninger
I did some independent brainstorming with @zsxwing on topic deletion, and 
yeah I agree with you that attempting to account for deleted topics in the 
offset in the KafkaSourceOffset such that compareTo is satisfied is more 
complicated than just eliminating compareTo. That said, there are still a few 
corner case - of the same topic being deleted and recreated. I am not familiar 
with how often this can happen (let us know your thoughts). But the general 
idea we can implement that that we attach a unique id to the topic in the 
KafkaSourceOffset. Whenever the new topic is detected (while running or across 
query restarts), generate a unique id so that it is consider as a new topic. 
Here are the options

**Option 1: When getOffset detects new topic, if the topic existed in 
previous offset, create new (topic, unique id)**
- Pro: Simple
- Con: Cannot detect if topic gets deleted+recreated between triggers 
(possibly, across query restarts), 

**Option 2: Use RebalanceListener to know when topic has been deleted**
- Pro: Handles topic deletion+recreation between triggers while query is 
active
- Con: Misses deletion+recreation during query restarts
- Con: Listener called on different thread, so possible race conditions

**Option 3: Use the creation time / cZxid of topic info stored in ZK to 
disambiguate**
- Pro: Zookeeper maintains uniques ness across any component restarts
- Con: Requires depending on full Kafka + ZK, 
- Con: Requires knowing the exact ZK path where topics are saved, but this 
can be tested and made sure that it never fails when we upgrade Kafka

I feel that we should just keep it simple for now, and go for Option 1. 
What do you think?







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
> For streaming you already know what the global order is, because you know 
when you asked for A and B. I agree that we should probably remove the 
comparable requirement from Offset in favor of just having equality.

Sounds good, as long as the execution portion of it isn't e.g. storing 
timestamps for A and running into issues on driver failover to a machine with 
clock drift.


> Assuming A was retrieved before B, then it seems like you emit a warning 
that data was possibly missed from A (since it was deleted before we could get 
it) and you start a new batch on topic B from offsets 0-1. Right?

I think in the absence of prior information about the position in a 
topicpartition, you start a new batch on topic B starting from wherever the 
consumer's position was at the time it acquired the subscription, which might 
not be 0.  I.e. you call position() before seekToEnd().  This might mean you 
need to record 2 kafka offsets in an SQL Offset if it's the first time you've 
seen that topicpartition.

> Are there arguments that we do support that you think are confusing?

I think the main thing that would be confusing is to specify topics in one 
way (custom-delimited string) for one configuration, and in another way 
(structured json) for another configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15102
  
For streaming you already know what the global order is, because you know 
when you asked for A and B.  I agree that we should probably remove the 
comparable requirement from `Offset` in favor of just having equality.  At the 
time it was a useful safe guard, but its clearly causing more confusion than 
anything.

Assuming `A` was retrieved before `B`, then it seems like you emit a 
warning that data was possibly missed from A (since it was deleted before we 
could get it) and you start a new batch on topic B from offsets 0-1.  Right?

> Or do you actually think that stuff like option("assign", 
"topicA:1:1,topicA:2:2,topicB:3:3") makes it clear what the arguments are?

We don't support assign.  When we do add that support, that is not super 
easy to follow, I don't feel strongly that its better or worse than JSON though 
if its unambiguous.

Are there arguments that we do support that you think are confusing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
@tdas moving this conversation back to the PR that's linked from the public 
jira

> yeah, i am trying to figure out all the options and write up something to 
so that we are clear on the pros and cons of each approach. At a high level, 
they match the ones you suggested. Though I am trying to tease out what needs 
to be done if deletion is to supported, and what needs to be done 
deletion+recreation of same topic needs to be supported.

> Also, at a high level I think that supporting deletion of topics does not 
require timestamps, its supporting deletion+recreation of the same topic would 
require more disambiguating information like timestamps. Here are a few 
questions,
> 
> Just to confirm, there are no unique guid kind of thing for topics in 
Kafka?
> 
> When a topic is deleted and recreated, what happens to the offsets? Does 
the recreated topic's offset start from 0? Or does it start from where the 
previous topic left off?

I don't think it's useful to focus too hard on deletion, it's a symptom not 
just a cause.  Subscription changes for other reasons would also expose the 
same issue.   I think the underlying issue is that the Offset interface is 
asking for a global monotonic order, which is hard in a distributed system.

To answer the questions, a topicpartition is a folder on disk (containing 
messages and offsets), and a node in ZK.  Both are deleted if the 
topicpartition is successfully deleted, so offsets start over.  ZK nodes have a 
cZxid, but I don't think it's a good idea to rely on Kafka having ZK internals, 
and I'm not sure what it buys you.  Say you do have zxid, and say you have two 
consumer states:

State A:  topic A, partition 0, offset 1, zxid 0x20

State B: topic B, partition 0, offset 1, zxid 0x21

At one point in time the consumer is in state A, and different point in 
time it is in state B.  Without more information, how can you tell if A < B or 
B < A ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65785/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #65785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65785/consoleFull)**
 for PR 15102 at commit 
[`786af2f`](https://github.com/apache/spark/commit/786af2f415b3b199bbeddb56caa60742e8ac01bb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15102
  
**[Test build #65785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65785/consoleFull)**
 for PR 15102 at commit 
[`786af2f`](https://github.com/apache/spark/commit/786af2f415b3b199bbeddb56caa60742e8ac01bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
This is pretty much the fundamental issue.  Kafka offsets alone aren't
capable of meeting the SQL Offset interface as defined.  I think that means
the Offset interface needs to be reconsidered or eliminated.  I think that
work needs to be done now, before even more work gets put into a
fundamentally unworkable interface.  I don't see what the rush is, because
this isn't going to make it into 2.0.1 anyway.


On Thu, Sep 22, 2016 at 12:29 PM, Shixiong Zhu 
wrote:

> PR with failing test indicating at least one reason why it's wrong from an
> end-user perspective:
>
> @koeninger  Thanks for writing the test.
> Yes, we are aware of this issue. However, it's unlikely that we can 
support
> deleting topics using the current Source API. You can take a look at how
> StreamExecution checks the new data here: https://github.com/apache/
> spark/blob/976f3b1227c1a9e0b878e010531285fdba57b6a7/sql/core/src/main/
> scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L320
>
> Using the hash code to compare offsets has a potential issue, it may make
> the latest offset be smaller than the old offset, then StreamExecution
> won't process the new data.
>
> I think one possible solution is StreamExecution doesn't compare the
> offsets, instead, it just assumes getOffset will always return the latest
> offset, and it never rollback to an old offset. This needs more discussion
> anyway. Hence I suggest we don't block this PR for this. Deleting topics
> can be supported in a later PR when we make an agreement on how to resolve
> the issue.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-22 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15102
  
> PR with failing test indicating at least one reason why it's wrong from 
an end-user perspective:

@koeninger Thanks for writing the test. Yes, we are aware of this issue. 
However, it's unlikely that we can support deleting topics using the current 
Source API. You can take a look at how StreamExecution checks the new data 
here: 
https://github.com/apache/spark/blob/976f3b1227c1a9e0b878e010531285fdba57b6a7/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L320
 

Using the hash code to compare offsets has a potential issue, it may make 
the latest offset be smaller than the old offset, then StreamExecution won't 
process the new data.

I think one possible solution is StreamExecution doesn't compare the 
offsets, instead, it just assumes `getOffset` will always return the latest 
offset, and it never rollback to an old offset. This needs more discussion 
anyway. Hence I suggest we don't block this PR for this. Deleting topics can be 
supported in a later PR when we make an agreement on how to resolve the issue.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-21 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
> I'd want to see some test cases though that show why the current 
implementation is wrong from an end-user perspective if it needs to block 
merging initial kafka support.

PR with failing test indicating at least one reason why it's wrong from an 
end-user perspective:

https://github.com/zsxwing/spark/pull/4

>  I do not think it is reasonable to suggest we block merging this patch 
on an overhaul of the DataSource API configuration system.

Here's what I actually said:
'if you know your plan down the line is to use json for structured 
configuration, you should use it now, and provide more convenient ways to 
construct json later, not use "convenient" non-json hacks now.'

No hyperbole about blocking on a complete overhaul, nothing that isn't 
backwards compatible.  I'm just saying that, if the design document already 
recognizes that json is necessary to work around the string -> string 
interface... start using structured json strings now, and make it more 
convenient later.

Or do you actually think that stuff like

option("assign", "topicA:1:1,topicA:2:2,topicB:3:3")

makes it clear what the arguments are?

> I think @koeninger made a good suggestion to block accepting certain 
kafka configurations.

In case it wasn't clear, I was not suggesting that preventing users from 
doing things they could otherwise do with Kafka is actually a good idea.  I 
think it's a bad idea, but if you're going to run with it, you might as well be 
consistent about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-21 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/15102
  
I asked @koeninger to clarify the specific suggestions he is referring to 
above, here's my response:

> [Comments here and on JIRA relating to concerns with the `Offset` 
implementation]

I'd be happy to consider a PR (either against this branch now or against 
master if this patch is merged) to use a different offset for `KafkaSource`, or 
even change the offset API in all of structured streaming. A change that large 
however would need to be a different PR, and should not block this one unless 
there are correctness issues. I'd want to see some test cases though that show 
why the current implementation is wrong from an end-user perspective if it 
needs to block merging initial kafka support.
 
> [Comments here and on JIRA about using String -> String for configuration]

I'd also be happy to consider PRs to improve the DataFrameReader/Writer 
interface.  That would also need to be in a separate PR.  It would need to work 
well across languages and be backwards compatible (i.e. what happens when you 
are using an older data source with the new configuration system?).

That said, we already have something that we've been using since Spark 1.3, 
even if its not perfect. It does work across languages and I don't see a 
problem with supporting comma separated strings long term for topic lists 
(kafka even does this today in its own configuration for things like 
`bootstrap.servers`.).  As a result, I do not think it is reasonable to suggest 
we block merging this patch on an overhaul of the DataSource API configuration 
system.

I think @koeninger made a good suggestion to block accepting certain kafka 
configurations.  In particular, `auto.offset.reset` does not make sense in a 
world where we are managing the offsets ourselves (kafka documentation suggests 
you set this to false if you maintain offsets externally).  Similarly, letting 
the user set the group id could cause data to get split with a different query 
and thus could affect correctness.  My only question here is if we should have 
a whitelist or a blacklist.  Scanning though the list of possible configuration 
options, I think it could go either way.  I tend to error on this side of more 
power and go with the blacklist.

I disagree that prefixing configuration that should be passed to kafka with 
kafka is a hack.  We do this in other places in the DataFrameReader/Writer API 
and have not had any complaints or confusion.  I could be convinced otherwise 
if someone can point out a case where this would be confusing or ambiguous to a 
user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65694/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15102
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >