[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204421#comment-15204421
]
Cody Koeninger commented on SPARK-12177:
I made a PR with my changes for discussion's sake
[
https://issues.apache.org/jira/browse/SPARK-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204404#comment-15204404
]
Cody Koeninger commented on SPARK-13939:
Are you actually certain that more than one of your
oo sure if this is an issue with spark engine or with the
> streaming module. Please let me know if you need more logs or you want me to
> raise a github issue/JIRA.
>
> Sorry for digressing on the original thread.
>
> On Fri, Mar 18, 2016 at 8:10 PM, Cody Koeninger <c...@koeninger.org>
Kafka doesn't have an accurate time-based index. Your options are to
maintain an index yourself, or start at a sufficiently early offset
and filter messages.
On Mon, Mar 21, 2016 at 7:28 AM, Nagu Kothapalli
wrote:
> Hi,
>
>
> I Want to collect data from kafka ( json
[
https://issues.apache.org/jira/browse/SPARK-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199912#comment-15199912
]
Cody Koeninger commented on SPARK-13939:
Do not use pprint(), that's just the python name
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203027#comment-15203027
]
Cody Koeninger commented on SPARK-12177:
Unless I'm misunderstanding your point, those changes
minimum, dev@ discussion (like this one) should be initiated.
>> As PMC is responsible for the project assets (including code), signoff
>> is required for it IMO.
>>
>> More experienced Apache members might be opine better in case I got it
>> wrong
running I'm seeing it retry correctly. However, I am
> having trouble getting the job started - number of retries does not seem to
> help with startup behavior.
>
> Thoughts?
>
> Regards,
>
> Bryan Jeffrey
>
> On Fri, Mar 18, 2016 at 10:44 AM, Cody Koeninger <c...@ko
[
https://issues.apache.org/jira/browse/SPARK-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199638#comment-15199638
]
Cody Koeninger commented on SPARK-13939:
When you say print to screen, are you using print
That's a networking error when the driver is attempting to contact
leaders to get the latest available offsets.
If it's a transient error, you can look at increasing the value of
spark.streaming.kafka.maxRetries, see
http://spark.apache.org/docs/latest/configuration.html
If it's not a transient
Is that happening only at startup, or during processing? If that's
happening during normal operation of the stream, you don't have enough
resources to process the stream in time.
There's not a clean way to deal with that situation, because it's a
violation of preconditions. If you want to
There's 1 topic per partition, so you're probably better off dealing
with topics that way rather than at the individual message level.
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
Look at the discussion of "HasOffsetRanges"
If you
[
https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200152#comment-15200152
]
Cody Koeninger commented on SPARK-13877:
Thumbs down on renaming the package name as well... from
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197536#comment-15197536
]
Cody Koeninger commented on SPARK-12177:
My fork is working at a very basic level for caching
Why would a PMC vote be necessary on every code deletion?
There was a Jira and pull request discussion about the submodules that
have been removed so far.
https://issues.apache.org/jira/browse/SPARK-13843
There's another ongoing one about Kafka specifically
arty external repositories - not owned by Apache.
>>
>> At a minimum, dev@ discussion (like this one) should be initiated.
>> As PMC is responsible for the project assets (including code), signoff
>> is required for it IMO.
>>
>> More experienced Apache members mi
[
https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195973#comment-15195973
]
Cody Koeninger commented on SPARK-13877:
[~hshreedharan] They aren't compatible from an api
[
https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195870#comment-15195870
]
Cody Koeninger commented on SPARK-13877:
I don't think it makes sense to put kafka 0.8 and kafka
like when they use the existing DStream.repartition where original
> per-topicpartition in-order processing is also not observed any more.
>
> Do you agree?
>
> On Thu, Mar 10, 2016 at 12:12 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> The central problem wit
[
https://issues.apache.org/jira/browse/SPARK-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195435#comment-15195435
]
Cody Koeninger commented on SPARK-13877:
I agree that it's a good idea to move everything that's
en the seek since position() will block to get
> the new offset.
>
> -Jason
>
> On Mon, Mar 14, 2016 at 2:37 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Sorry, by metadata I also meant the equivalent of the old
>> OffsetRequest api, which partitionsFor
nsuming
> messages. We'd probably want to understand why those are insufficient
> before considering new APIs.
>
> -Jason
>
> On Mon, Mar 14, 2016 at 12:17 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Regarding the rebalance listener, in the case of the spark
>
out some specific change on the consumer API, please
> feel free to create a new KIP with the detailed motivation and proposed
> modifications.
>
> Guozhang
>
> On Fri, Mar 11, 2016 at 12:28 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Is there a KIP or Jira relat
Sounds like the jar you built doesn't include the dependencies (in
this case, the spark-streaming-kafka subproject). When you use
spark-submit to submit a job to spark, you need to either specify all
dependencies as additional --jars arguments (which is a pain), or
build an uber-jar containing
s://github.com/guptamukul/sparktest.git
>
> ____
> From: Cody Koeninger <c...@koeninger.org>
> Sent: 11 March 2016 23:04
> To: Mukul Gupta
> Cc: user@spark.apache.org
> Subject: Re: Kafka + Spark streaming, RDD partitions not processed in
ee
>> that new partitions should be consumed automatically). I guess we can
>> continue this discussion on the spark list then :-)
>>
>> Thanks
>> Mansi.
>>
>> On Thu, Mar 10, 2016 at 7:43 AM, Cody Koeninger <c...@koeninger.org>
>> wrote:
s, String.class,
> StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet);
>
> JavaDStream processed = messages.map(new Function<Tuple2<String,
> String>, String>() {
>
> @Override
> public String call(Tuple2<String, String> arg0) throws Exception {
Can you post your actual code?
On Thu, Mar 10, 2016 at 9:55 PM, Mukul Gupta wrote:
> Hi All, I was running the following test: Setup 9 VM runing spark workers
> with 1 spark executor each. 1 VM running kafka and spark master. Spark
> version is 1.6.0 Kafka version is
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190103#comment-15190103
]
Cody Koeninger commented on SPARK-12177:
There are a lot of things I'm really not happy with so
The central problem with doing anything like this is that you break
one of the basic guarantees of kafka, which is in-order processing on
a per-topicpartition basis.
As far as PRs go, because of the new consumer interface for kafka 0.9
and 0.10, there's a lot of potential change already underway.
If you do any RDD transformation, it's going to return a different RDD
than the original.
The implication for casting to HasOffsetRanges is specifically called
out in the docs at
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
On Thu,
.@gmail.com> wrote:
>
>> In order to do anything meaningful with the consumer itself in rebalance
>> callback (e.g. commit offset), you would need to hold on the consumer
>> reference; admittedly it sounds a bit awkward, but by design we choose to
>> not enforce it in
gt; callback (e.g. commit offset), you would need to hold on the consumer
> reference; admittedly it sounds a bit awkward, but by design we choose to
> not enforce it in the interface itself.
>
> Guozhang
>
> On Wed, Mar 9, 2016 at 3:39 PM, Cody Koeninger <c...@koeninger.org> w
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189325#comment-15189325
]
Cody Koeninger commented on SPARK-12177:
Clearly K and V are serializable somehow, because
gt; On Wed, Mar 9, 2016 at 2:11 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Filed https://issues.apache.org/jira/browse/KAFKA-3370.
> >
> > On Wed, Mar 9, 2016 at 1:11 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
> >
> >> That sounds like
[
https://issues.apache.org/jira/browse/KAFKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188181#comment-15188181
]
Cody Koeninger commented on KAFKA-3370:
---
So would case 1 also include addition of an entirely new
> me know if you have any other ideas.
>
> Guozhang
>
> On Wed, Mar 9, 2016 at 12:25 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Yeah, I think I understood what you were saying. What I'm saying is
>> that if there were a way to just fetch metadata without
s going to get all the partitions assigned
> to itself (i.e. you are only running a single instance).
>
> Guozhang
>
>
> On Wed, Mar 9, 2016 at 6:22 AM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Another unfortunate thing about ConsumerRebalanceListener is t
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187847#comment-15187847
]
Cody Koeninger commented on SPARK-12177:
Anybody want to volunteer to fight the good fight
e relevance to what
> you're talking about.
>
> Perhaps if both functions (the one with partitions arg and the one without)
> returned just ConsumerRecord, I would like that more.
>
> - Alan
>
> On Tue, Mar 8, 2016 at 6:49 AM, Cody Koeninger <c...@koeninger.org> wrote:
>
e relevance to what
> you're talking about.
>
> Perhaps if both functions (the one with partitions arg and the one without)
> returned just ConsumerRecord, I would like that more.
>
> - Alan
>
> On Tue, Mar 8, 2016 at 6:49 AM, Cody Koeninger <c...@koeninger.org> wrote:
>
t;
> On Wed, Mar 9, 2016 at 10:46 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Spark streaming by default will not start processing a batch until the
>> current batch is finished. So if your processing time is larger than
>> your batch time, delays will buil
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187613#comment-15187613
]
Cody Koeninger commented on SPARK-12177:
If anyone wants to take a look at the stuff I'm hacking
Spark streaming by default will not start processing a batch until the
current batch is finished. So if your processing time is larger than
your batch time, delays will build up.
On Wed, Mar 9, 2016 at 11:09 AM, Sachin Aggarwal
wrote:
> Hi All,
>
> we have batchTime
to the consumer. Seems like this makes it unnecessarily
awkward to serialize or provide a 0 arg constructor for the listener.
On Wed, Mar 9, 2016 at 7:28 AM, Cody Koeninger <c...@koeninger.org> wrote:
> I thought about ConsumerRebalanceListener, but seeking to the
> beginning any time there's
@Override
> public void onPartitionsAssigned(Collection
> partitions) {
> consumer.seekToBeginning(partitions.toArray(new
> TopicPartition[0]));
> }
> };
>
> consumer.subscribe(topics, listener);
>
> On
ory
>
> When I try ./bin/spark-shell --master local[2]
>
> I get: no such file or directory
> Failed to find spark assembly, you need to build Spark before running this
> program
>
>
>
> Sent from my iPhone
>
>> On 8 Mar 2016, at 21:50, Cody Koe
Using the 0.9 consumer, I would like to start consuming at the
beginning or end, without specifying auto.offset.reset.
This does not seem to be possible:
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> conf.getString("kafka.brokers"),
"key.deserializer" ->
aster.sh"
>
> Thanks,
>
> Aida
> Sent from my iPhone
>
>> On 8 Mar 2016, at 19:02, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> You said you downloaded a prebuilt version.
>>
>> You shouldn't have to mess with maven or building spark at all
You said you downloaded a prebuilt version.
You shouldn't have to mess with maven or building spark at all. All
you need is a jvm, which it looks like you already have installed.
You should be able to follow the instructions at
http://spark.apache.org/docs/latest/
and
No, looks like you'd have to catch them in the serializer and have the
serializer return option or something. The new consumer builds a buffer
full of records, not one at a time.
On Mar 8, 2016 4:43 AM, "Marius Soutier" <mps@gmail.com> wrote:
>
> > On 04.03.2016, at
No, looks like you'd have to catch them in the serializer and have the
serializer return option or something. The new consumer builds a buffer
full of records, not one at a time.
On Mar 8, 2016 4:43 AM, "Marius Soutier" <mps@gmail.com> wrote:
>
> > On 04.03.2016, at
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184387#comment-15184387
]
Cody Koeninger commented on SPARK-12177:
I've been hacking on a simple lru cache for consumers
[
https://issues.apache.org/jira/browse/SPARK-13707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183129#comment-15183129
]
Cody Koeninger commented on SPARK-13707:
- To be clear, is this a problem with the UI only? I.e
Wanted to survey what people are using the direct stream
messageHandler for, besides just extracting key / value / offset.
Would your use case still work if that argument was removed, and the
stream just contained ConsumerRecord objects
Wanted to survey what people are using the direct stream
messageHandler for, besides just extracting key / value / offset.
Would your use case still work if that argument was removed, and the
stream just contained ConsumerRecord objects
gt; What do you mean by consistent? Throughout the life cycle of an app, the
>> executors can come and go and as a result really has no consistency. Do you
>> just need it for a specific job?
>>
>>
>>
>> On Thu, Mar 3, 2016 at 3:08 PM, Cody Koeninger <c...@koeni
I need getPreferredLocations to choose a consistent executor for a
given partition in a stream. In order to do that, I need to know what
the current executors are.
I'm currently grabbing them from the block manager master .getPeers(),
which works, but I don't know if that's the most reasonable
Jay, thanks for the response.
Regarding the new consumer API for 0.9, I've been reading through the code
for it and thinking about how it fits in to the existing Spark integration.
So far I've seen some interesting challenges, and if you (or anyone else on
the dev list) have time to provide some
Jay, thanks for the response.
Regarding the new consumer API for 0.9, I've been reading through the code
for it and thinking about how it fits in to the existing Spark integration.
So far I've seen some interesting challenges, and if you (or anyone else on
the dev list) have time to provide some
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176037#comment-15176037
]
Cody Koeninger commented on SPARK-12177:
How is it a huge hassle to keep the known working
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174507#comment-15174507
]
Cody Koeninger commented on SPARK-12177:
Thanks for the example of performance numbers
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174415#comment-15174415
]
Cody Koeninger commented on SPARK-12177:
Mansi are you talking about performance improvements
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174212#comment-15174212
]
Cody Koeninger commented on SPARK-12177:
I'm happy to help in whatever way. If people think
essing further stages and fetching
> next batch.
>
> I will start with higher number of executor cores and see how it goes.
>
> --
> Thanks
> Jatin Kumar | Rocket Scientist
> +91-7696741743 m
>
> On Tue, Mar 1, 2016 at 9:07 PM, Cody Koeninger <c...@koeninger.org> wrote:
&
[
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174092#comment-15174092
]
Cody Koeninger commented on SPARK-12177:
My thoughts so far
Must-haves:
- The major new feature
What code is triggering the stack overflow?
On Mon, Feb 29, 2016 at 11:13 PM, Vinti Maheshwari
wrote:
> Hi All,
>
> I am getting below error in spark-streaming application, i am using kafka
> for input stream. When i was doing with socket, it was working fine. But
> when i
> "How do I keep a balance of executors which receive data from Kafka and
which process data"
I think you're misunderstanding how the direct stream works. The executor
which receives data is also the executor which processes data, there aren't
separate receivers. If it's a single stage worth of
[
https://issues.apache.org/jira/browse/SPARK-13569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173895#comment-15173895
]
Cody Koeninger commented on SPARK-13569:
This is probably reasonable, but SPARK-12177
Does this issue involve Spark at all? Otherwise you may have better luck
on a perl or kafka related list.
On Mon, Feb 29, 2016 at 3:26 PM, Vinti Maheshwari
wrote:
> Hi All,
>
> I wrote kafka producer using kafka perl api, But i am getting error when i
> am passing
You're getting confused about what code is running on the driver vs what
code is running on the executor. Read
http://spark.apache.org/docs/latest/programming-guide.html#understanding-closures-a-nameclosureslinka
On Mon, Feb 29, 2016 at 8:00 AM, franco barrientos <
Spark in general isn't a good fit if you're trying to make sure that
certain tasks only run on certain executors.
You can look at overriding getPreferredLocations and increasing the value
of spark.locality.wait, but even then, what do you do when an executor
fails?
On Fri, Feb 26, 2016 at 8:08
in Kafka?
>
> 发自WPS邮箱客戶端
> 在 Cody Koeninger <c...@koeninger.org>,2016年2月25日 上午11:58写道:
>
> The per partition offsets are part of the rdd as defined on the driver.
> Have you read
>
> https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
>
>
The per partition offsets are part of the rdd as defined on the driver.
Have you read
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
and/or watched
https://www.youtube.com/watch?v=fXnNEq1v3VA
On Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen wrote:
he/javax.activation/activation/jars/activation-1.1.jar:javax/activation/ActivationDataFlavor.class*
>
> Here is complete error log:
> https://gist.github.com/Vibhuti/07c24d2893fa6e520d4c
>
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 12:16 PM, Cody Koeninger <c.
y. I need to check how to use sbt
> assembly.
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 11:10 AM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> Are you using sbt assembly? That's what will include all of the
>> non-provided dependencies in a s
encies ++= Seq(
> "org.apache.spark" % "spark-streaming_2.10" % "1.5.2",
> "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.2"
> )
>
>
>
> Regards,
> ~Vinti
>
> On Wed, Feb 24, 2016 at 9:33 AM, Cody K
spark streaming is provided, kafka is not.
This build file
https://github.com/koeninger/kafka-exactly-once/blob/master/build.sbt
includes some hacks for ivy issues that may no longer be strictly
necessary, but try that build and see if it works for you.
On Wed, Feb 24, 2016 at 11:14 AM, Vinti
That's correct, when you create a direct stream, you specify the
topicpartitions you want to be a part of the stream (the other method for
creating a direct stream is just a convenience wrapper).
On Wed, Feb 24, 2016 at 2:15 AM, 陈宇航 wrote:
> Here I use the
The direct stream will let you do both of those things. Is there a reason
you want to use receivers?
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
http://spark.apache.org/docs/latest/configuration.html#spark-streaming
look for maxRatePerPartition
On Mon, Feb 22, 2016
I saw this slide:
http://image.slidesharecdn.com/east2016v2matei-160217154412/95/2016-spark-summit-east-keynote-matei-zaharia-5-638.jpg?cb=1455724433
Didn't see the talk - was this just referring to the existing work on the
spark-streaming-kafka subproject, or is someone actually working on
g :
> Can this issue be resolved by having a smaller block interval?
>
> Regards,
> Praveen
> On 18 Feb 2016 21:30, "praveen S" <mylogi...@gmail.com> wrote:
>
>> Can having a smaller block interval only resolve this?
>>
>> Regards,
>> Praveen
&
Backpressure won't help you with the first batch, you'd need
spark.streaming.kafka.maxRatePerPartition
for that
On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote:
> Have a look at
>
> spark.streaming.backpressure.enabled
> Property
>
> Regards,
> Praveen
> On 18 Feb 2016
You can print whatever you want wherever you want, it's just a question of
whether it's going to show up on the driver or the various executors logs
On Wed, Feb 17, 2016 at 5:50 AM, Cyril Scetbon
wrote:
> I don't think we can print an integer value in a spark streaming
Just use a kafka rdd in a batch job or two, then start your streaming job.
On Wed, Feb 17, 2016 at 12:57 AM, Abhishek Anand
wrote:
> I have a spark streaming application running in production. I am trying to
> find a solution for a particular use case when my
You could use sc.parallelize... but the offsets are already available at
the driver, and they're a (hopefully) small enough amount of data that's
it's probably more straightforward to just use the normal cassandra client
to save them from the driver.
On Tue, Feb 16, 2016 at 1:15 AM, Abhishek
uot;500");
> props.put("zookeeper.sync.time.ms", "250");
> props.put("auto.commit.interval.ms", "1000");
>
>
> How can I do the same for the receiver inside spark-streaming for Spark V1.3.1
>
>
> Thanks
>
> Nipun
>
>
>
> On Wed, F
Please don't change the behavior of DirectKafkaInputDStream.
Returning an empty rdd is (imho) the semantically correct thing to do, and
some existing jobs depend on that behavior.
If it's really an issue for you, you can either override
directkafkainputdstream, or just check isEmpty as the first
It's a pair because there's a key and value for each message.
If you just want a single topic, put a single topic in the map of topic ->
number of partitions.
See
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
On
[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139250#comment-15139250
]
Cody Koeninger commented on SPARK-3146:
---
I think this can be safely closed, given the messageHandler
[
https://issues.apache.org/jira/browse/SPARK-13106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139242#comment-15139242
]
Cody Koeninger commented on SPARK-13106:
https://issues.apache.org/jira/browse/SPARK-10963
[
https://issues.apache.org/jira/browse/SPARK-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139252#comment-15139252
]
Cody Koeninger commented on SPARK-7827:
---
If there is no more information available on this, can we
sages processed per event in
>> sparkstreaming web UI . Also I am counting the messages inside
>> foreachRDD .
>> Removed the settings for backpressure but still the same .
>>
>>
>>
>>
>>
>> Sent from Samsung Mobile.
>>
If you're using the direct stream, you have 0 receivers. Do you mean you
have 1 executor?
Can you post the relevant call to createDirectStream from your code, as
well as any relevant spark configuration?
On Thu, Feb 4, 2016 at 8:13 PM, Diwakar Dhanuskodi <
diwakar.dhanusk...@gmail.com> wrote:
2 things:
- you're only attempting to read from a single TopicAndPartition. Since
your topic has multiple partitions, this probably isn't what you want
- you're getting an offset out of range exception because the offset you're
asking for doesn't exist in kafka.
Use the other
Partition=100" --driver-memory 2g
> --executor-memory 1g --class com.tcs.dime.spark.SparkReceiver --files
> /etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml,/etc/hadoop/conf/mapred-site.xml,/etc/hadoop/conf/yarn-site.xml,/etc/hive/conf/hive-site.xml
> --jars
am counting the messages inside
> foreachRDD .
> Removed the settings for backpressure but still the same .
>
>
>
>
>
> Sent from Samsung Mobile.
>
>
> ---- Original message
> From: Cody Koeninger <c...@koeninger.org>
> Date:06/02/2016 00
own issue ?
>
> On Wed, Jan 27, 2016 at 11:52 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> Have you tried spark 1.5?
>>
>> On Wed, Jan 27, 2016 at 11:14 AM, vimal dinakaran <vimal3...@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
KafkaRDD will use the standard kafka configuration parameter
refresh.leader.backoff.ms if it is set in the kafkaParams map passed to
createDirectStream.
On Tue, Feb 2, 2016 at 9:10 PM, Chen Song wrote:
> For Kafka direct stream, is there a way to set the time between
It's possible you could (ab)use updateStateByKey or mapWithState for this.
But honestly it's probably a lot more straightforward to just choose a
reasonable batch size that gets you a reasonable file size for most of your
keys, then use filecrush or something similar to deal with the hdfs small
701 - 800 of 1347 matches
Mail list logo