[jira] [Commented] (SPARK-20928) SPIP: Continuous Processing Mode for Structured Streaming

2017-10-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221025#comment-16221025 ] Cody Koeninger commented on SPARK-20928: No, it doesn't exist yet as far as I know. Reason I ask

[jira] [Commented] (SPARK-20928) SPIP: Continuous Processing Mode for Structured Streaming

2017-10-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220694#comment-16220694 ] Cody Koeninger commented on SPARK-20928: Can you clarify how this impacts sinks having access

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Cody Koeninger
down. > > > On 16-Oct-2017 7:34 PM, "Cody Koeninger" <c...@koeninger.org> wrote: >> >> Have you tried adjusting the timeout? >> >> On Mon, Oct 16, 2017 at 8:08 AM, Suprith T Jain <t.supr...@gmail.com> >> wrote: >> >

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Cody Koeninger
Have you tried adjusting the timeout? On Mon, Oct 16, 2017 at 8:08 AM, Suprith T Jain wrote: > Hi guys, > > I have a 3 node cluster and i am running a spark streaming job. consider the > below example > > /*spark-submit* --master yarn-cluster --class >

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202923#comment-16202923 ] Cody Koeninger commented on SPARK-20928: If a given sink is handling a result, why does handling

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-10-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202905#comment-16202905 ] Cody Koeninger commented on SPARK-20928: I was talking about the specific case of jobs with only

Re: How to read from multiple kafka topics using structured streaming (spark 2.2.0)?

2017-09-19 Thread Cody Koeninger
You should be able to pass a comma separated string of topics to subscribe. subscribePattern isn't necessary On Tue, Sep 19, 2017 at 2:54 PM, kant kodali wrote: > got it! Sorry. > > On Tue, Sep 19, 2017 at 12:52 PM, Jacek Laskowski wrote: >> >> Hi, >> >>

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-18 Thread Cody Koeninger
Have you searched in jira, e.g. https://issues.apache.org/jira/browse/SPARK-19185 On Mon, Sep 18, 2017 at 1:56 AM, HARSH TAKKAR wrote: > Hi > > Changing spark version if my last resort, is there any other workaround for > this problem. > > > On Mon, Sep 18, 2017 at 11:43

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-11 Thread Cody Koeninger
https://issues-test.apache.org/jira/browse/SPARK-18258 On Mon, Sep 11, 2017 at 7:15 AM, Dmitry Naumenko wrote: > Hi all, > > It started as a discussion in > https://stackoverflow.com/questions/46153105/how-to-get-kafka-offsets-with-spark-structured-streaming-api. > > So

Re: Multiple Kafka topics processing in Spark 2.2

2017-09-11 Thread Cody Koeninger
If you want an "easy" but not particularly performant way to do it, each org.apache.kafka.clients.consumer.ConsumerRecord has a topic. The topic is going to be the same for the entire partition as long as you haven't shuffled, hence the examples on how to deal with it at a partition level. On

[jira] [Commented] (SPARK-17147) Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)

2017-09-07 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157002#comment-16157002 ] Cody Koeninger commented on SPARK-17147: Patch is there, if anyone wants to test it and provide

Re: Putting Kafka 0.8 behind an (opt-in) profile

2017-09-06 Thread Cody Koeninger
rimental one. > > Is the Kafka 0.10 integration as stable as it is going to be, and worth > marking as such for 2.3.0? > > > On Tue, Sep 5, 2017 at 4:12 PM Cody Koeninger <c...@koeninger.org> wrote: >> >> +1 to going ahead and giving a deprecation warning now >

Re: Putting Kafka 0.8 behind an (opt-in) profile

2017-09-05 Thread Cody Koeninger
+1 to going ahead and giving a deprecation warning now On Tue, Sep 5, 2017 at 6:39 AM, Sean Owen wrote: > On the road to Scala 2.12, we'll need to make Kafka 0.8 support optional in > the build, because it is not available for Scala 2.12. > >

Re: Spark streaming Kafka 0.11 integration

2017-09-05 Thread Cody Koeninger
Here's the jira for upgrading to a 0.10.x point release, which is effectively the discussion of upgrading to 0.11 now https://issues.apache.org/jira/browse/SPARK-18057 On Tue, Sep 5, 2017 at 1:27 AM, matus.cimerman wrote: > Hi guys, > > is there any plans to support

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-29 Thread Cody Koeninger
ms) > ) > > val kafkaStreamRdd = kafkaStream.transform { rdd => > rdd.map(consumerRecord => (consumerRecord.key(), consumerRecord.value())) > } > > On Mon, Aug 28, 2017 at 11:56 AM, swetha kasireddy < > swethakasire...@gmail.com> wrote: > >> There is no

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-28 Thread Cody Koeninger
Just wanted to point out that because the jira isn't labeled SPIP, it won't have shown up linked from http://spark.apache.org/improvement-proposals.html On Mon, Aug 28, 2017 at 2:20 PM, Wenchen Fan wrote: > Hi all, > > It has been almost 2 weeks since I proposed the data

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-28 Thread Cody Koeninger
xCapacity" -> Integer.valueOf(96), > "spark.streaming.kafka.consumer.cache.maxCapacity" -> Integer.valueOf(96), > > "spark.streaming.kafka.consumer.poll.ms" -> Integer.valueOf(1024), > > > http://markmail.org/message/n4cdxwurlhf44q5x > > https://issues.

Re: Kafka Consumer Pre Fetch Messages + Async commits

2017-08-28 Thread Cody Koeninger
1. No, prefetched message offsets aren't exposed. 2. No, I'm not aware of any plans for sync commit, and I'm not sure that makes sense. You have to be able to deal with repeat messages in the event of failure in any case, so the only difference sync commit would make would be (possibly) slower

Re: Slower performance while running Spark Kafka Direct Streaming with Kafka 10 cluster

2017-08-25 Thread Cody Koeninger
Why are you setting consumer.cache.enabled to false? On Fri, Aug 25, 2017 at 2:19 PM, SRK wrote: > Hi, > > What would be the appropriate settings to run Spark with Kafka 10? My job > works fine with Spark with Kafka 8 and with Kafka 8 cluster. But its very > slow with

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-22 Thread Cody Koeninger
o.offset.reset" -> "latest",. > > > Thanks! > > On Mon, Aug 21, 2017 at 9:06 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Yes, you can start from specified offsets. See ConsumerStrategy, >> specifically Assign >> >> >> http:

Re: How to force Spark Kafka Direct to start from the latest offset when the lag is huge in kafka 10?

2017-08-21 Thread Cody Koeninger
Yes, you can start from specified offsets. See ConsumerStrategy, specifically Assign http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#your-own-data-store On Tue, Aug 15, 2017 at 1:18 PM, SRK wrote: > Hi, > > How to force Spark Kafka Direct to

Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-09 Thread Cody Koeninger
org.apache.spark.streaming.kafka.KafkaCluster has methods getLatestLeaderOffsets and getEarliestLeaderOffsets On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande wrote: > Thanks TD. > > On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das > wrote: >>

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-07 Thread Cody Koeninger
saved into Cassandra. It is working as expected 99% of the time except that > when there is an exception, I did not want the offsets to be committed. > > By Filtering for unsuccessful attempts, do you mean filtering the bad > records... > > > > > > > On Mon, Aug 7, 2017

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-07 Thread Cody Koeninger
raised, I was thinking I won't be committing the >> offsets, but the offsets are committed all the time independent of whether >> an exception was raised or not. >> >> It will be helpful if you can explain this behavior. >> >> >> On Sun, Aug 6, 2017 at 5:19

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread Cody Koeninger
sould be doing, then what is the right way? > > On Sun, Aug 6, 2017 at 9:21 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> If your complaint is about offsets being committed that you didn't >> expect... auto commit being false on executors shouldn't have anythin

Re: kafka settting, enable.auto.commit to false is being overridden and I lose data. please help!

2017-08-06 Thread Cody Koeninger
If your complaint is about offsets being committed that you didn't expect... auto commit being false on executors shouldn't have anything to do with that. Executors shouldn't be auto-committing, that's why it's being overridden. What you've said and the code you posted isn't really enough to

Re: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them

2017-07-10 Thread Cody Koeninger
The warnings regarding configuration on the executor are for the executor kafka consumer, not the driver kafka consumer. In general, the executor kafka consumers should consume only exactly the offsets the driver told them to, and not be involved in committing offsets / part of the same group as

[jira] [Commented] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2017-07-06 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076646#comment-16076646 ] Cody Koeninger commented on SPARK-19680: Direct Stream can take a mapping from topicpartition

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-06-29 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068980#comment-16068980 ] Cody Koeninger commented on SPARK-18057: Kafka 0.11 is now released. Are we upgrading spark

Re: The stability of Spark Stream Kafka 010

2017-06-29 Thread Cody Koeninger
Given the emphasis on structured streaming, I don't personally expect a lot more work being put into DStreams-based projects, outside of bugfixes. Stable designation is kind of arbitrary at that point. That 010 version wasn't developed until spark 2.0 timeframe, but you can always try

[jira] [Commented] (SPARK-21233) Support pluggable offset storage

2017-06-28 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066517#comment-16066517 ] Cody Koeninger commented on SPARK-21233: You already have the choice of where you want to store

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-19 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054690#comment-16054690 ] Cody Koeninger commented on SPARK-20928: Cool, can you label it SPIP so it shows up linked from

Re: Spark-Kafka integration - build failing with sbt

2017-06-19 Thread Cody Koeninger
en i provide the > appropriate packages in LibraryDependencies ? > which ones would have helped compile this ? > > > > On Sat, Jun 17, 2017 at 2:53 PM, karan alang <karan.al...@gmail.com> wrote: >> >> Thanks, Cody .. yes, was able to fix that. >>

Re: Spark-Kafka integration - build failing with sbt

2017-06-17 Thread Cody Koeninger
There are different projects for different versions of kafka, spark-streaming-kafka-0-8 and spark-streaming-kafka-0-10 See http://spark.apache.org/docs/latest/streaming-kafka-integration.html On Fri, Jun 16, 2017 at 6:51 PM, karan alang wrote: > I'm trying to compile

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-17 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052831#comment-16052831 ] Cody Koeninger commented on SPARK-20928: This needs an improvement proposal. Based

Re: SPARK-19547

2017-06-08 Thread Cody Koeninger
Can you explain in more detail what you mean by "distribute Kafka topics among different instances of same consumer group"? If you're trying to run multiple streams using the same consumer group, it's already documented that you shouldn't do that. On Thu, Jun 8, 2017 at 12:43 AM, Rastogi, Pankaj

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-03 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035964#comment-16035964 ] Cody Koeninger commented on SPARK-20928: For jobs that only have narrow stages, I think it should

Re: Message getting lost in Kafka + Spark Streaming

2017-05-30 Thread Cody Koeninger
First thing I noticed, you should be using a singleton kafka producer, not recreating one every partition. It's threadsafe. On Tue, May 30, 2017 at 7:59 AM, Vikash Pareek wrote: > I am facing an issue related to spark streaming with kafka, my use case is as >

Re: do we need to enable writeAheadLogs for DirectStream as well or is it only for indirect stream?

2017-05-02 Thread Cody Koeninger
You don't need write ahead logs for direct stream. On Tue, May 2, 2017 at 11:32 AM, kant kodali wrote: > Hi All, > > I need some fault tolerance for my stateful computations and I am wondering > why we need to enable writeAheadLogs for DirectStream like Kafka (for > Indirect

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-01 Thread Cody Koeninger
omething like. >> >> df.writeStream.format("kafka").start("topic") >> >> Seems reasonable if people don't think that is confusing. >> >> On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <c...@koeninger.org> wrote: >>> >>> I'm c

Re: Exactly-once semantics with kakfa CanCommitOffsets.commitAsync?

2017-04-28 Thread Cody Koeninger
t the semantics here - i.e., calls to commitAsync are not > actually guaranteed to succeed? If that's the case, the docs could really > be a *lot* clearer about that. > > Thanks, > > DR > > On Fri, Apr 28, 2017 at 11:34 AM, Cody Koeninger <c...@koeninger.org> wrote:

Re: Exactly-once semantics with kakfa CanCommitOffsets.commitAsync?

2017-04-28 Thread Cody Koeninger
>From that doc: " However, Kafka is not transactional, so your outputs must still be idempotent. " On Fri, Apr 28, 2017 at 10:29 AM, David Rosenstrauch wrote: > I'm doing a POC to test recovery with spark streaming from Kafka. I'm using > the technique for storing the

Re: Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll

2017-04-27 Thread Cody Koeninger
il the 10th second of > executing the last consumed offset of the same partition was 200.000 - and so > forth. This is the information I seek to get. > >> On 27 Apr 2017, at 20:11, Cody Koeninger <c...@koeninger.org> wrote: >> >> Are you asking for commits for ev

Re: Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll

2017-04-27 Thread Cody Koeninger
sets of the intermediate batches - and hence the questions. > >> On 26 Apr 2017, at 21:42, Cody Koeninger <c...@koeninger.org> wrote: >> >> have you read >> >> http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself >>

Re: help/suggestions to setup spark cluster

2017-04-27 Thread Cody Koeninger
r this. Please >> provide me some links to help me get started. >> >> Thanks >> -Anna >> >> On Wed, Apr 26, 2017 at 7:46 PM, Cody Koeninger <c...@koeninger.org> >> wrote: >>> >>> The standalone cluster manager is fine for production. D

Re: help/suggestions to setup spark cluster

2017-04-26 Thread Cody Koeninger
The standalone cluster manager is fine for production. Don't use Yarn or Mesos unless you already have another need for it. On Wed, Apr 26, 2017 at 4:53 PM, anna stax wrote: > Hi Sam, > > Thank you for the reply. > > What do you mean by > I doubt people run spark in a.

Re: Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll

2017-04-26 Thread Cody Koeninger
t; enable.auto.commit property to false. > > Any idea onto how to use the KafkaConsumer’s auto offset commits? Keep in > mind that I do not care about exactly-once, hence having messages replayed is > perfectly fine. > >> On 26 Apr 2017, at 19:26, Cody Koeninger <c...@

Re: Spark Streaming 2.1 Kafka consumer - retrieving offset commits for each poll

2017-04-26 Thread Cody Koeninger
What is it you're actually trying to accomplish? You can get topic, partition, and offset bounds from an offset range like http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#obtaining-offsets Timestamp isn't really a meaningful idea for a range of offsets. On Tue, Apr

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-04-20 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977709#comment-15977709 ] Cody Koeninger commented on SPARK-18057: People have also been reporting that explicit dependency

[jira] [Commented] (SPARK-20036) impossible to read a whole kafka topic using kafka 0.10 and spark 2.0.0

2017-04-18 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973554#comment-15973554 ] Cody Koeninger commented on SPARK-20036: [~danielnuriyev] what actually happened when you removed

[jira] [Commented] (SPARK-20036) impossible to read a whole kafka topic using kafka 0.10 and spark 2.0.0

2017-04-18 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973507#comment-15973507 ] Cody Koeninger commented on SPARK-20036: I'll submit a PR to add a note to the docs about

[jira] [Updated] (SPARK-20287) Kafka Consumer should be able to subscribe to more than one topic partition

2017-04-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-20287: --- What you're describing is closer to the receiver-based implementation, which had a number

[jira] [Commented] (SPARK-19976) DirectStream API throws OffsetOutOfRange Exception

2017-04-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966208#comment-15966208 ] Cody Koeninger commented on SPARK-19976: What would your expected behavior be when you delete

[jira] [Commented] (SPARK-20037) impossible to set kafka offsets using kafka 0.10 and spark 2.0.0

2017-04-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966199#comment-15966199 ] Cody Koeninger commented on SPARK-20037: I'd be inclined to say this is a duplicate of the issue

[jira] [Commented] (SPARK-20036) impossible to read a whole kafka topic using kafka 0.10 and spark 2.0.0

2017-04-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966193#comment-15966193 ] Cody Koeninger commented on SPARK-20036: fixKafkaParams is related to executor consumers

[jira] [Commented] (SPARK-20287) Kafka Consumer should be able to subscribe to more than one topic partition

2017-04-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966180#comment-15966180 ] Cody Koeninger commented on SPARK-20287: The issue here is that the underlying new Kafka consumer

[jira] [Commented] (SPARK-19904) SPIP Add Spark Project Improvement Proposal doc to website

2017-03-27 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943827#comment-15943827 ] Cody Koeninger commented on SPARK-19904: It has been added to apache/spark-website git repo

Re: Spark streaming to kafka exactly once

2017-03-22 Thread Cody Koeninger
If you're talking about reading the same message multiple times in a failure situation, see https://github.com/koeninger/kafka-exactly-once If you're talking about producing the same message multiple times in a failure situation, keep an eye on

Re: [Spark Streaming+Kafka][How-to]

2017-03-22 Thread Cody Koeninger
Glad you got it worked out. That's cool as long as your use case doesn't actually require e.g. partition 0 to always be scheduled to the same executor across different batches. On Tue, Mar 21, 2017 at 7:35 PM, OUASSAIDI, Sami wrote: > So it worked quite well with a

Re: Spark Streaming from Kafka, deal with initial heavy load.

2017-03-20 Thread Cody Koeninger
You want spark.streaming.kafka.maxRatePerPartition for the direct stream. On Sat, Mar 18, 2017 at 3:37 PM, Mal Edwin wrote: > > Hi, > You can enable backpressure to handle this. > > spark.streaming.backpressure.enabled > spark.streaming.receiver.maxRate > > Thanks, >

Re: Streaming 2.1.0 - window vs. batch duration

2017-03-17 Thread Cody Koeninger
Probably easier if you show some more code, but if you just call dstream.window(Seconds(60)) you didn't specify a slide duration, so it's going to default to your batch duration of 1 second. So yeah, if you're just using e.g. foreachRDD to output every message in the window, every second it's

Re: [Spark Streaming+Kafka][How-to]

2017-03-16 Thread Cody Koeninger
Spark just really isn't a good fit for trying to pin particular computation to a particular executor, especially if you're relying on that for correctness. On Thu, Mar 16, 2017 at 7:16 AM, OUASSAIDI, Sami wrote: > > Hi all, > > So I need to specify how an executor

[jira] [Commented] (SPARK-19904) SPIP Add Spark Project Improvement Proposal doc to website

2017-03-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906593#comment-15906593 ] Cody Koeninger commented on SPARK-19904: Up now at http://spark.apache.org/improvement

[jira] [Assigned] (SPARK-19904) SPIP Add Spark Project Improvement Proposal doc to website

2017-03-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-19904: -- Assignee: Cody Koeninger > SPIP Add Spark Project Improvement Proposal doc to webs

[jira] [Updated] (SPARK-19904) SPIP Add Spark Project Improvement Proposal doc to website

2017-03-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-19904: --- Issue Type: Improvement (was: Bug) > SPIP Add Spark Project Improvement Proposal

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-03-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905774#comment-15905774 ] Cody Koeninger commented on SPARK-18057: Based on previous kafka client upgrades I wouldn't

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-03-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905747#comment-15905747 ] Cody Koeninger commented on SPARK-18057: I think the bigger question is once there's a kafka

[jira] [Commented] (SPARK-19888) Seeing offsets not resetting even when reset policy is configured explicitly

2017-03-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905730#comment-15905730 ] Cody Koeninger commented on SPARK-19888: That stacktrace also shows a concurrent modification

Re: Question about upgrading Kafka client version

2017-03-10 Thread Cody Koeninger
There are existing tickets on the issues around kafka versions, e.g. https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten any committer weigh-in on direction. On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori wrote: > Guys, > > To change the subject from

[jira] [Commented] (SPARK-19863) Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application

2017-03-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905603#comment-15905603 ] Cody Koeninger commented on SPARK-19863: Isn't this basically a duplicate of SPARK-19185

[jira] [Updated] (SPARK-19904) SPIP Add Spark Project Improvement Proposal doc to website

2017-03-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-19904: --- Description: see http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Improvement

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
pen ticket with the SPIP label show it should show up On Fri, Mar 10, 2017 at 11:19 AM, Reynold Xin <r...@databricks.com> wrote: > We can just start using spip label and link to it. > > > > On Fri, Mar 10, 2017 at 9:18 AM, Cody Koeninger <c...@koeninger.org> wrote:

[jira] [Created] (SPARK-19904) SPIP Add Spark Project Improvement Proposal doc to website

2017-03-10 Thread Cody Koeninger (JIRA)
Cody Koeninger created SPARK-19904: -- Summary: SPIP Add Spark Project Improvement Proposal doc to website Key: SPARK-19904 URL: https://issues.apache.org/jira/browse/SPARK-19904 Project: Spark

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
the admins > can make a new issue type unfortunately. We may just have to mention a > convention involving title and label or something. > > On Fri, Mar 10, 2017 at 4:52 PM Cody Koeninger <c...@koeninger.org> wrote: >> >> I think it ought to be its own page, linked from t

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
I think it ought to be its own page, linked from the more / community menu dropdowns. We also need the jira tag, and for the page to clearly link to filters that show proposed / completed SPIPs On Fri, Mar 10, 2017 at 3:39 AM, Sean Owen wrote: > Alrighty, if nobody is

Re: Spark Improvement Proposals

2017-03-09 Thread Cody Koeninger
code/doc > change we can just review and merge as usual. > > On Tue, Mar 7, 2017 at 3:15 PM Cody Koeninger <c...@koeninger.org> wrote: >> >> Another week, another ping. Anyone on the PMC willing to call a vote on >> this?

Re: Spark Improvement Proposals

2017-03-07 Thread Cody Koeninger
it to a vote and revisit the proposal in a few >> months. >> Joseph >> >> On Fri, Feb 24, 2017 at 5:35 AM, Cody Koeninger <c...@koeninger.org> >> wrote: >> >>> It's been a week since any further discussion. >>> >>> Do PMC members

Re: [Spark Kafka] API Doc pages for Kafka 0.10 not current

2017-02-28 Thread Cody Koeninger
The kafka-0-8 and kafka-0-10 integrations have conflicting dependencies. Last time I checked, Spark's doc publication puts everything all in one classpath, so publishing them both together won't work. I thought there was already a Jira ticket related to this, but a quick look didn't turn it up.

[jira] [Commented] (SPARK-17147) Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction)

2017-02-27 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887091#comment-15887091 ] Cody Koeninger commented on SPARK-17147: Dean if you guys have any bandwith to help test out

[jira] [Updated] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2017-02-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger updated SPARK-18057: --- I think you should ask Michael and / or Ryan what their plan is. > Update structu

Re: Spark Improvement Proposals

2017-02-24 Thread Cody Koeninger
gt; >> wrote: >>> >>> The doc looks good to me. >>> >>> Ryan, the role of the shepherd is to make sure that someone >>> knowledgeable with Spark processes is involved: this person can advise >>> on technical and procedural considerati

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-22 Thread Cody Koeninger
mja...@seecs.edu.pk> wrote: > I just noticed that Spark version that I am using (2.0.2) is built with > Scala 2.11. However I am using Kafka 0.8.2 built with Scala 2.10. Could this > be the reason why we are getting this error? > > On Mon, Feb 20, 2017 at 5:50 PM, Cody Koeninger <c

[jira] [Comment Edited] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2017-02-22 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878623#comment-15878623 ] Cody Koeninger edited comment on SPARK-19680 at 2/22/17 4:25 PM

[jira] [Commented] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2017-02-22 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878623#comment-15878623 ] Cody Koeninger commented on SPARK-19680: The issue here is likely that you have lost data

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Cody Koeninger
+ inTime + ", " + outTime) > > } > > } > > })) > > } > > } > > > On Mon, Feb 20, 2017 at 3:10 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> That's an indication that the beginning offset for a given b

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-20 Thread Cody Koeninger
That's an indication that the beginning offset for a given batch is higher than the ending offset, i.e. something is seriously wrong. Are you doing anything at all odd with topics, i.e. deleting and recreating them, using compacted topics, etc? Start off with a very basic stream over the same

Re: Spark Improvement Proposals

2017-02-16 Thread Cody Koeninger
It's just a process > document. > > Still, a fine step IMHO. > > On Thu, Feb 16, 2017 at 4:22 PM Reynold Xin <r...@databricks.com> wrote: >> >> Updated. Any feedback from other community members? >> >> >> On Wed, Feb 15, 2017 at 2:53 AM, Cody Koenin

Re: streaming-kafka-0-8-integration (direct approach) and monitoring

2017-02-14 Thread Cody Koeninger
Not sure what to tell you at that point - maybe compare what is present in ZK to a known working group. On Tue, Feb 14, 2017 at 9:06 PM, Mohammad Kargar <mkar...@phemi.com> wrote: > Yes offset nodes are in zk and I can get the values. > > On Feb 14, 2017 6:54 PM, &quo

Re: streaming-kafka-0-8-integration (direct approach) and monitoring

2017-02-14 Thread Cody Koeninger
spark job. On Tue, Feb 14, 2017 at 8:40 PM, Mohammad Kargar <mkar...@phemi.com> wrote: > I'm running 0.10 version and > > ./bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --list > > lists the group. > > On Feb 14, 2017 6:34 PM, "Cody Koeninger" <

Re: Write JavaDStream to Kafka (how?)

2017-02-14 Thread Cody Koeninger
It looks like you're creating a kafka producer on the driver, and attempting to write the string representation of stringIntegerJavaPairRDD. Instead, you probably want to be calling stringIntegerJavaPairRDD.foreachPartition, so that producing to kafka is being done on the executor. Read

Re: streaming-kafka-0-8-integration (direct approach) and monitoring

2017-02-14 Thread Cody Koeninger
Can you explain what wasn't successful and/or show code? On Tue, Feb 14, 2017 at 6:03 PM, Mohammad Kargar wrote: > As explained here, direct approach of integration between spark streaming > and kafka does not update offsets in Zookeeper, hence Zookeeper-based Kafka >

Re: Spark Improvement Proposals

2017-02-14 Thread Cody Koeninger
ustomers. The to-do feature list was always above 100. Sometimes, the >> customers are feeling frustrated when we are unable to deliver them on time >> due to the resource limits and others. Even if they paid us billions, we >> still need to do it phase by phase or somet

Re: Spark Improvement Proposals

2017-02-11 Thread Cody Koeninger
ccepted or rejected, so that we do not end >> up with a distracting long tail of half-hearted proposals. >> >> These rules are meant to be flexible, but the current document should be >> clear about who is in charge of a SPIP, and the state it is currently in. >> >> We h

Re: Spark 2.0 Scala 2.11 and Kafka 0.10 Scala 2.10

2017-02-08 Thread Cody Koeninger
Pretty sure there was no 0.10.0.2 release of apache kafka. If that's a hortonworks modified version you may get better results asking in a hortonworks specific forum. Scala version of kafka shouldn't be relevant either way though. On Wed, Feb 8, 2017 at 5:30 PM, u...@moosheimer.com

Re: SSpark streaming: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$

2017-02-06 Thread Cody Koeninger
You should not need to include jars for Kafka, the spark connectors have the appropriate transitive dependency on the correct version. On Sat, Feb 4, 2017 at 3:25 PM, Marco Mistroni wrote: > Hi > not sure if this will help at all, and pls take it with a pinch of salt as > i

Re: Kafka dependencies in Eclipse project /Pls assist

2017-01-31 Thread Cody Koeninger
spark-streaming-kafka-0-10 has a transitive dependency on the kafka library, you shouldn't need to include kafka explicitly. What's your actual list of dependencies? On Tue, Jan 31, 2017 at 3:49 PM, Marco Mistroni wrote: > HI all > i am trying to run a sample spark code

Re: mapWithState question

2017-01-30 Thread Cody Koeninger
Keep an eye on https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging although it'll likely be a while On Mon, Jan 30, 2017 at 3:41 PM, Tathagata Das wrote: > If you care about the semantics of those writes to

[jira] [Resolved] (SPARK-19361) kafka.maxRatePerPartition for compacted topic cause exception

2017-01-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-19361. Resolution: Duplicate > kafka.maxRatePerPartition for compacted topic cause except

[jira] [Commented] (SPARK-19361) kafka.maxRatePerPartition for compacted topic cause exception

2017-01-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839683#comment-15839683 ] Cody Koeninger commented on SPARK-19361: Compacted topics in general don't work with direct

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.1.0

2017-01-25 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837844#comment-15837844 ] Cody Koeninger commented on SPARK-18057: If you can get commiter agreement on the outstanding

<    1   2   3   4   5   6   7   8   9   10   >