[jira] [Commented] (SPARK-17829) Stable format for offset log

2016-10-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610399#comment-15610399 ] Cody Koeninger commented on SPARK-17829: I'm not telling you to do it that way, just asking

[jira] [Commented] (SPARK-17829) Stable format for offset log

2016-10-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609653#comment-15609653 ] Cody Koeninger commented on SPARK-17829: Have you considered using a typeclass? > Stable for

Re: Zero Data Loss in Spark with Kafka

2016-10-25 Thread Cody Koeninger
t;Done reading offsets from ZooKeeper. Took " + >> (System.currentTimeMillis() - start)) >> >> Some(offsets) >> case None => >> LogHandler.log.info("No offsets found in ZooKeeper. Took " + >> (System.currentTimeMillis() - start

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-25 Thread Cody Koeninger
I think only supporting 1 version of scala at any given time is not sufficient, 2 probably is ok. I.e. don't drop 2.10 before 2.12 is out + supported On Tue, Oct 25, 2016 at 10:56 AM, Sean Owen wrote: > The general forces are that new versions of things to support emerge,

Re: Spark streaming communication with different versions of kafka

2016-10-25 Thread Cody Koeninger
Kafka consumers should be backwards compatible with kafka brokers, so at the very least you should be able to use the streaming-spark-kafka-0-10 to do what you're talking about. On Tue, Oct 25, 2016 at 4:30 AM, Prabhu GS wrote: > Hi, > > I would like to know if the same

[jira] [Created] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.1.0

2016-10-21 Thread Cody Koeninger (JIRA)
Cody Koeninger created SPARK-18057: -- Summary: Update structured streaming kafka from 10.0.1 to 10.1.0 Key: SPARK-18057 URL: https://issues.apache.org/jira/browse/SPARK-18057 Project: Spark

[jira] [Updated] (SPARK-18056) Update KafkaDStreams from 10.0.1 to 10.1.0

2016-10-21 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-18056: --- Description: There are a couple of relevant KIPs here, https://archive.apache.org/dist/kafka

[jira] [Created] (SPARK-18056) Update KafkaDStreams from 10.0.1 to 10.1.0

2016-10-21 Thread Cody Koeninger (JIRA)
Cody Koeninger created SPARK-18056: -- Summary: Update KafkaDStreams from 10.0.1 to 10.1.0 Key: SPARK-18056 URL: https://issues.apache.org/jira/browse/SPARK-18056 Project: Spark Issue Type

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-21 Thread Cody Koeninger
al.ms can be set differently. > I'll leave it to you on how to add this to docs! > > > On Thu, Oct 20, 2016 at 1:41 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Right on, I put in a PR to make a note of that in the docs. >> >> On Thu, Oct 20, 2016 at 1

Re: Kafka Direct Stream: Offset Managed Manually (Exactly Once)

2016-10-21 Thread Cody Koeninger
gain, I know what I have to do :) > > On Fri, Oct 21, 2016 at 5:05 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> 0. If your processing time is regularly greater than your batch >> interval you're going to have problems anyway. Investigate this more, >> set m

Re: Kafka Direct Stream: Offset Managed Manually (Exactly Once)

2016-10-21 Thread Cody Koeninger
0. If your processing time is regularly greater than your batch interval you're going to have problems anyway. Investigate this more, set maxRatePerPartition, something. 1. That's personally what I tend to do. 2. Why are you relying on checkpoints if you're storing offset state in the database?

[jira] [Commented] (SPARK-17829) Stable format for offset log

2016-10-20 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593000#comment-15593000 ] Cody Koeninger commented on SPARK-17829: At least with regard to kafka offsets, it might be good

[jira] [Created] (SPARK-18033) Deprecate TaskContext.partitionId

2016-10-20 Thread Cody Koeninger (JIRA)
Cody Koeninger created SPARK-18033: -- Summary: Deprecate TaskContext.partitionId Key: SPARK-18033 URL: https://issues.apache.org/jira/browse/SPARK-18033 Project: Spark Issue Type

Re: [PSA] TaskContext.partitionId != the actual logical partition index

2016-10-20 Thread Cody Koeninger
think that makes sense, I can start a ticket. On Thu, Oct 20, 2016 at 1:16 PM, Reynold Xin <r...@databricks.com> wrote: > Seems like a good new API to add? > > > On Thu, Oct 20, 2016 at 11:14 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Access to the par

Re: [PSA] TaskContext.partitionId != the actual logical partition index

2016-10-20 Thread Cody Koeninger
Access to the partition ID is necessary for basically every single one of my jobs, and there isn't a foreachPartiionWithIndex equivalent. You can kind of work around it with empty foreach after the map, but it's really awkward to explain to people. On Thu, Oct 20, 2016 at 12:52 PM, Reynold Xin

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-20 Thread Cody Koeninger
Right on, I put in a PR to make a note of that in the docs. On Thu, Oct 20, 2016 at 12:13 PM, Srikanth <srikanth...@gmail.com> wrote: > Yeah, setting those params helped. > > On Wed, Oct 19, 2016 at 1:32 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >>

Re: StructuredStreaming status

2016-10-19 Thread Cody Koeninger
ncy ? >> > I think that the fact that they serve as an output trigger is a problem, > but Structured Streaming seems to resolve this now. > >> >> Thanks >> Shivaram >> >> On Wed, Oct 19, 2016 at 1:29 PM, Michael Armbrust >> <mich...@databricks

Re: StructuredStreaming status

2016-10-19 Thread Cody Koeninger
Is anyone seriously thinking about alternatives to microbatches? On Wed, Oct 19, 2016 at 2:45 PM, Michael Armbrust wrote: > Anything that is actively being designed should be in JIRA, and it seems > like you found most of it. In general, release windows can be found on

Re: Spark 2.0 with Kafka 0.10 exception

2016-10-19 Thread Cody Koeninger
Iterator.next(KafkaRDD.scala:193) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$21.next(Iterator.scala:838) > > > On Wed, Sep 7, 2016 at 3:55 PM, Cod

[jira] [Commented] (SPARK-17147) Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)

2016-10-18 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586417#comment-15586417 ] Cody Koeninger commented on SPARK-17147: If that's something you're seeing regularly, probably

[jira] [Commented] (SPARK-17147) Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)

2016-10-18 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586397#comment-15586397 ] Cody Koeninger commented on SPARK-17147: Then no, this issue is unlikely to affect you unless

[jira] [Updated] (SPARK-17147) Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)

2016-10-18 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17147: --- Summary: Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Cody Koeninger
+1 to putting docs in one clear place. On Oct 18, 2016 6:40 AM, "Sean Owen" wrote: > I'm OK with that. The upside to the wiki is that it can be edited directly > outside of a release cycle. However, in practice I find that the wiki is > rarely changed. To me it also serves

[jira] [Commented] (SPARK-17147) Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets

2016-10-17 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584172#comment-15584172 ] Cody Koeninger commented on SPARK-17147: Well, are you using compacted topics? > Spark Stream

Re: cutting 2.0.2?

2016-10-17 Thread Cody Koeninger
SPARK-17841 three line bugfix that has a week old PR SPARK-17812 being able to specify starting offsets is a must have for a Kafka mvp in my opinion, already has a PR SPARK-17813 I can put in a PR for this tonight if it'll be considered On Mon, Oct 17, 2016 at 12:28 AM, Reynold Xin

Re: Spark Improvement Proposals

2016-10-17 Thread Cody Koeninger
P. However I think that Spark should >> have real-time streaming support. Currently I see many posts/comments >> that "Spark has too big latency". Spark Streaming is doing very good >> jobs with micro-batches, however I think it is possible to add also more >

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-15 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578894#comment-15578894 ] Cody Koeninger commented on SPARK-17812: As you just said yourself, assign doesn't mean you

[jira] [Commented] (SPARK-17935) Add KafkaForeachWriter in external kafka-0.8.0 for structured streaming module

2016-10-15 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578141#comment-15578141 ] Cody Koeninger commented on SPARK-17935: Why is this in kafka-0-8, when we haven't resolved

[jira] [Commented] (SPARK-17938) Backpressure rate not adjusting

2016-10-15 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578133#comment-15578133 ] Cody Koeninger commented on SPARK-17938: There was pretty extensive discussion of this on list

[jira] [Commented] (SPARK-17813) Maximum data per trigger

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15577037#comment-15577037 ] Cody Koeninger commented on SPARK-17813: To be clear, the current direct stream (and as a result

[jira] [Updated] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17812: --- Description: Right now you can only run a Streaming Query starting from either the earliest

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15577022#comment-15577022 ] Cody Koeninger commented on SPARK-17812: Assign is useful, otherwise you have no way of consuming

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-14 Thread Cody Koeninger
nge[] offsets = ((HasOffsetRanges) rdd.rdd()).offsetRanges(); > CanCommitOffsets cco = (CanCommitOffsets) directKafkaStream.dstream(); > cco.commitAsync(offsets); > > I tried setting "max.poll.records" to 1000 but this did not help. > > Any idea what could be wrong? >

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

Re: Kafka integration: get existing Kafka messages?

2016-10-14 Thread Cody Koeninger
utor: Exception in task 0.0 in stage 0.0 (TID 0) > java.lang.AssertionError: assertion failed: Failed to get records for > spark-executor-null mytopic2 0 2 after polling for 512 > == > > -Original Message- > From: Cody Koeninger [mailto:c...@koeninger.org] > Sent: 201

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-14 Thread Cody Koeninger
I can't be sure, no. On Fri, Oct 14, 2016 at 3:06 AM, Julian Keppel <juliankeppel1...@gmail.com> wrote: > Okay, thank you! Can you say, when this feature will be released? > > 2016-10-13 16:29 GMT+02:00 Cody Koeninger <c...@koeninger.org>: >> >> As Sean said, it'

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # *New partition* is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # *New partition* is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

[jira] [Updated] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17937: --- Description: Possible events for which offsets are needed: # New partition is discovered

[jira] [Created] (SPARK-17937) Clarify Kafka offset semantics for Structured Streaming

2016-10-14 Thread Cody Koeninger (JIRA)
Cody Koeninger created SPARK-17937: -- Summary: Clarify Kafka offset semantics for Structured Streaming Key: SPARK-17937 URL: https://issues.apache.org/jira/browse/SPARK-17937 Project: Spark

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573843#comment-15573843 ] Cody Koeninger commented on SPARK-17812: So I think this is what we're agreed on: Mutually

[jira] [Commented] (SPARK-17813) Maximum data per trigger

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573806#comment-15573806 ] Cody Koeninger commented on SPARK-17813: So issues to be worked out here (assuming we're still

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573766#comment-15573766 ] Cody Koeninger commented on SPARK-17812: OK, failing on start is clear (it's really annoying

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573563#comment-15573563 ] Cody Koeninger commented on SPARK-17812: So a short term question - with your proposed interface

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573479#comment-15573479 ] Cody Koeninger commented on SPARK-17812: While some decision is better than none, can you help me

[jira] [Comment Edited] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573432#comment-15573432 ] Cody Koeninger edited comment on SPARK-17812 at 10/13/16 10:44 PM

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573432#comment-15573432 ] Cody Koeninger commented on SPARK-17812: If you're seriously worried that people are going to get

[jira] [Comment Edited] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573395#comment-15573395 ] Cody Koeninger edited comment on SPARK-17812 at 10/13/16 10:25 PM: --- 1

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573395#comment-15573395 ] Cody Koeninger commented on SPARK-17812: 1. we dont have lists, we have strings. regexes

[jira] [Issue Comment Deleted] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-17812: --- Comment: was deleted (was: One other slightly ugly thing... {noformat} // starting

[jira] [Comment Edited] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573166#comment-15573166 ] Cody Koeninger edited comment on SPARK-17812 at 10/13/16 9:17 PM

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573166#comment-15573166 ] Cody Koeninger commented on SPARK-17812: Here's my concrete suggestion: 3 mutually exclusive

[jira] [Comment Edited] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572922#comment-15572922 ] Cody Koeninger edited comment on SPARK-17812 at 10/13/16 8:33 PM: -- Sorry

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573089#comment-15573089 ] Cody Koeninger commented on SPARK-17812: One other slightly ugly thing... {noformat} // starting

[jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572922#comment-15572922 ] Cody Koeninger commented on SPARK-17812: Sorry, I didn't see this comment until just now. X

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-13 Thread Cody Koeninger
I've always been confused as to why it would ever be a good idea to put any streaming query system on the critical path for synchronous < 100msec requests. It seems to make a lot more sense to have a streaming system do asynch updates of a store that has better latency and quality of service

[jira] [Commented] (FLINK-3037) Make the behavior of the Kafka consumer configurable if the offsets to restore from are not available

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572133#comment-15572133 ] Cody Koeninger commented on FLINK-3037: --- As a user, I want to be able to start low-value, high

[jira] [Commented] (SPARK-17900) Mark the following Spark SQL APIs as stable

2016-10-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572119#comment-15572119 ] Cody Koeninger commented on SPARK-17900: Thanks for doing this, should make things clearer

Re: Limit Kafka batches size with Spark Streaming

2016-10-13 Thread Cody Koeninger
fka.maxRatePerPartition", "10") > conf.set("spark.streaming.backpressure.enabled", "true") > > That's not normal, is it? Do you notice anything odd in my logs? > > Thanks a lot. > > > > On 10/12/2016 07:31 PM, Cody Koeninger w

Re: Want to test spark-sql-kafka but get unresolved dependency error

2016-10-13 Thread Cody Koeninger
As Sean said, it's unreleased. If you want to try it out, build spark http://spark.apache.org/docs/latest/building-spark.html The easiest way to include the jar is probably to use mvn install to put it in your local repository, then link it in your application's mvn or sbt build file as

Re: Kafka integration: get existing Kafka messages?

2016-10-12 Thread Cody Koeninger
response. > > > > So Kafka direct stream actually has consumer on both the driver and > executor? Can you please provide more details? Thank you very much! > > > > ________ > > From: Cody Koeninger [mailto:c...@koeninger.org] > Sent:

[jira] [Closed] (SPARK-15408) Spark streaming app crashes with NotLeaderForPartitionException

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger closed SPARK-15408. -- Resolution: Cannot Reproduce > Spark streaming app crashes with NotLeaderForPartitionExcept

[jira] [Commented] (SPARK-15272) DirectKafkaInputDStream doesn't work with window operation

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570221#comment-15570221 ] Cody Koeninger commented on SPARK-15272: Checking to see if the 0.10 consumer's handling

[jira] [Comment Edited] (SPARK-15272) DirectKafkaInputDStream doesn't work with window operation

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570221#comment-15570221 ] Cody Koeninger edited comment on SPARK-15272 at 10/12/16 11:33 PM

[jira] [Commented] (SPARK-11698) Add option to ignore kafka messages that are out of limit rate

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570208#comment-15570208 ] Cody Koeninger commented on SPARK-11698: Would a custom ConsumerStrategy for the new consumer

[jira] [Resolved] (SPARK-10320) Kafka Support new topic subscriptions without requiring restart of the streaming context

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-10320. Resolution: Fixed Fix Version/s: 2.0.0 SPARK-12177 added the new consumer, which

[jira] [Closed] (SPARK-9947) Separate Metadata and State Checkpoint Data

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger closed SPARK-9947. - Resolution: Won't Fix The direct DStream api already gives access to offsets, and it seems clear

[jira] [Commented] (SPARK-8337) KafkaUtils.createDirectStream for python is lacking API/feature parity with the Scala/Java version

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570186#comment-15570186 ] Cody Koeninger commented on SPARK-8337: --- Can this be closed, given that the subtasks are resolved

[jira] [Closed] (SPARK-5505) ConsumerRebalanceFailedException from Kafka consumer

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger closed SPARK-5505. - Resolution: Won't Fix The old kafka High Level Consumer has been abandoned at this point. SPARK

[jira] [Resolved] (SPARK-5718) Add native offset management for ReliableKafkaReceiver

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-5718. --- Resolution: Fixed Fix Version/s: 2.0.0 SPARK-12177 added support for the native kafka

[jira] [Commented] (SPARK-10815) API design: data sources and sinks

2016-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570139#comment-15570139 ] Cody Koeninger commented on SPARK-10815: Another unfortunate thing about the Sink api

Re: Limit Kafka batches size with Spark Streaming

2016-10-12 Thread Cody Koeninger
rg/e730492453.png notice the cutover point On Wed, Oct 12, 2016 at 11:00 AM, Samy Dindane <s...@dindane.com> wrote: > I am 100% sure. > > println(conf.get("spark.streaming.backpressure.enabled")) prints true. > > > On 10/12/2016 05:48 PM, Cody Koeninger w

Re: Limit Kafka batches size with Spark Streaming

2016-10-12 Thread Cody Koeninger
gt; Do you have any idea about why aren't backpressure working? How to debug > this? > > > On 10/11/2016 06:08 PM, Cody Koeninger wrote: >> >> http://spark.apache.org/docs/latest/configuration.html >> >> "This rate is upper bounded by the values >&g

Re: What happens when an executor crashes?

2016-10-12 Thread Cody Koeninger
, are you talking about > repartition using partitionBy()? > > > On 10/11/2016 01:23 AM, Cody Koeninger wrote: >> >> Repartition almost always involves a shuffle. >> >> Let me see if I can explain the recovery stuff... >> >> Say you start with two kafka part

Re: Kafka integration: get existing Kafka messages?

2016-10-12 Thread Cody Koeninger
its set to none for the executors, because otherwise they wont do exactly what the driver told them to do. you should be able to set up the driver consumer to determine batches however you want, though. On Wednesday, October 12, 2016, Haopu Wang wrote: > Hi, > > > > I want

[jira] [Commented] (SPARK-17344) Kafka 0.8 support for Structured Streaming

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567457#comment-15567457 ] Cody Koeninger commented on SPARK-17344: Given the choice between rewriting underlying kafka

[jira] [Closed] (SPARK-17837) Disaster recovery of offsets from WAL

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger closed SPARK-17837. -- Resolution: Duplicate Duplicate of SPARK-17829 > Disaster recovery of offsets from

[jira] [Commented] (FLINK-4280) New Flink-specific option to set starting position of Kafka consumer without respecting external offsets in ZK / Broker

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567263#comment-15567263 ] Cody Koeninger commented on FLINK-4280: --- FLINK-3123 would be pretty important to me, happy to help

[jira] [Commented] (SPARK-17344) Kafka 0.8 support for Structured Streaming

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566244#comment-15566244 ] Cody Koeninger commented on SPARK-17344: How long would it take CDH to distribute 0.10

Re: Limit Kafka batches size with Spark Streaming

2016-10-11 Thread Cody Koeninger
http://spark.apache.org/docs/latest/configuration.html "This rate is upper bounded by the values spark.streaming.receiver.maxRate and spark.streaming.kafka.maxRatePerPartition if they are set (see below)." On Tue, Oct 11, 2016 at 10:57 AM, Samy Dindane wrote: > Hi, > > Is it

Re: Can mapWithState state func be called every batchInterval?

2016-10-11 Thread Cody Koeninger
involving mapWithState > of course :) I'm just wondering why it doesn't support this use case yet. > > On Tue, Oct 11, 2016 at 3:41 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> They're telling you not to use the old function because it's linear on the >> total

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565559#comment-15565559 ] Cody Koeninger commented on SPARK-17853: Good, will keep this ticket open at least until

Re: Can mapWithState state func be called every batchInterval?

2016-10-11 Thread Cody Koeninger
They're telling you not to use the old function because it's linear on the total number of keys, not keys in the batch, so it's slow. But if that's what you really want, go ahead and do it, and see if it performs well enough. On Oct 11, 2016 6:28 AM, "DandyDev" wrote: Hi

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Cody Koeninger
it works. Thanks! Although changing this is ok for us, i am interested in the why :-) With the old connector this was not a problem nor is it afaik with the pure kafka consumer api 2016-10-11 14:30 GMT+02:00 Cody Koeninger <c...@koeninger.org>: > Just out of curiosity, have you tried

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565400#comment-15565400 ] Cody Koeninger commented on SPARK-17853: Use a different group id. Let me know if that addresses

Re: Problems with new experimental Kafka Consumer for 0.10

2016-10-11 Thread Cody Koeninger
c: String): >> InputDStream[ConsumerRecord[String, Bytes]] = { >> KafkaUtils.createDirectStream[String, Bytes](ssc, >> LocationStrategies.PreferConsistent, ConsumerStrategies.Subscribe[String, >> Bytes](Set(topic), kafkaParams)) >> } >> >>

Re: Manually committing offset in Spark 2.0 with Kafka 0.10 and Java

2016-10-11 Thread Cody Koeninger
ync(offsets); > > java.lang.ClassCastException: > org.apache.spark.streaming.api.java.JavaInputDStream > cannot be cast to org.apache.spark.streaming.kafka010.CanCommitOffsets > at SparkTest.lambda$0(SparkTest.java:103) > > Best regards, > Max > > > 2016-10-10 20:18 GMT+02:00 Cody Koeninger <c...@koeninger.org>:

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565278#comment-15565278 ] Cody Koeninger commented on SPARK-17853: Which version of DStream are you using, 0-10 or 0-8

Re: What happens when an executor crashes?

2016-10-10 Thread Cody Koeninger
, starting at topic-0 offsets 60, topic-1 offsets 66 Clear as mud? On Mon, Oct 10, 2016 at 5:36 PM, Samy Dindane <s...@dindane.com> wrote: > > > On 10/10/2016 8:14 PM, Cody Koeninger wrote: >> >> Glad it was helpful :) >> >> As far as executors, my e

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
ul, because we have run into some trouble in the past > with some inside the ASF but essentially outside the Spark community who > didn't like the way we were doing things. > > On Mon, Oct 10, 2016 at 3:53 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Apache documents s

[jira] [Commented] (SPARK-17812) More granular control of starting offsets

2016-10-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563285#comment-15563285 ] Cody Koeninger commented on SPARK-17812: No, it's not covered by strict assign. If you don't

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
ng lots of rules >> from the beginning makes it confusing and can reduce contributions. >> Although, as engineers, we believe that anything can be solved using >> mechanical rules, in practice software development is a social process that >> ultimately requires humans to tackle

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
community effort and I wouldn't want to move forward if up to half of the > community thinks it's an untenable idea. > > rb > > On Mon, Oct 10, 2016 at 12:07 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> I think this is closer to a procedural issue than a code mod

<    1   2   3   4   5   6   7   8   9   10   >