the value of codifying that in our process? I
>> think restricting who can submit proposals would only undermine them by
>> pushing contributors out. Maybe I'm missing something here?
>>
>> rb
>>
>>
>>
>> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c.
o can submit proposals would only undermine them by
> pushing contributors out. Maybe I'm missing something here?
>
> rb
>
>
>
> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Yes, users suggesting SIPs is a good thing and is expl
This should give you hints on the necessary cast:
http://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html#tab_java_2
The main ugly thing there is that the java rdd is wrapping the scala
rdd, so you need to unwrap one layer via rdd.rdd()
If anyone wants to work on a PR to update
-exactly-once/blob/master/src/main/scala/example/TransactionalPerPartition.scala
>> I am trying to understand how this would work if an executor crashes, so I
>> tried making one crash manually, but I noticed it kills the whole job
>> instead of creating another executor to resume the task.
rategies have a large effect on the goal, we should
> have it discussed when discussing the goals. In addition, while it is often
> easy to throw out completely infeasible goals, it is often much harder to
> figure out that the goals are unfeasible without fine tuning.
>
>
>
>
>
What is it you're actually trying to accomplish?
On Mon, Oct 10, 2016 at 5:26 AM, Samy Dindane wrote:
> I managed to make a specific executor crash by using
> TaskContext.get.partitionId and throwing an exception for a specific
> executor.
>
> The issue I have now is that the
[
https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560925#comment-15560925
]
Cody Koeninger commented on SPARK-17812:
I want to start a pattern subscription at known good
[
https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560839#comment-15560839
]
Cody Koeninger commented on SPARK-17812:
That totally kills the usability of SubscribePattern
[
https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560837#comment-15560837
]
Cody Koeninger commented on SPARK-17815:
My personal concerns about complexity are because I'm
o I'd really like a culture of having those early.
> People don't argue about prettiness when they discuss APIs, they argue about
> the core concepts to expose in order to meet various goals, and then they're
> stuck maintaining those for a long time.
>
> Matei
>
> On Oct 9,
proposal is technically feasible
>> right now? If it's infeasible, that will be discovered later during design
>> and implementation. Same thing with rejected strategies -- listing some of
>> those is definitely useful sometimes, but if you make this a *required*
>> se
[
https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560719#comment-15560719
]
Cody Koeninger commented on SPARK-17812:
Generally agree with the direction of what you're saying
iscovered later
> during design and implementation. Same thing with rejected strategies --
> listing some of those is definitely useful sometimes, but if you make this
> a *required* section, people are just going to fill it in with bogus stuff
> (I've seen this happen before).
>
&g
[
https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560684#comment-15560684
]
Cody Koeninger commented on SPARK-17815:
Regarding kafka consumer behavior, I'm not saying it's
Regarding name, if the SIP overlap is a concern, we can pick a different name.
My tongue in cheek suggestion would be
Spark Lightweight Improvement process (SPARKLI)
On Sun, Oct 9, 2016 at 4:14 PM, Cody Koeninger <c...@koeninger.org> wrote:
> So to focus the discussion on the specific
ures more visible (and their approval more formal)?
>
> BTW note that in either case, I'd like to have a template for design docs
> too, which should also include goals. I think that would've avoided some of
> the issues you brought up.
>
> Matei
>
> On Oct 9, 2016, at 10
At a super high level, it depends on whether you want the SIPs to be
>>> PRDs for getting some quick feedback on the goals of a feature before it is
>>> designed, or something more like full-fledged design docs (just a more
>>> visible design doc for bigger changes). I loo
Cody Koeninger created SPARK-17841:
--
Summary: Kafka 0.10 commitQueue needs to be drained
Key: SPARK-17841
URL: https://issues.apache.org/jira/browse/SPARK-17841
Project: Spark
Issue Type
n what this
> entails, and then we can discuss this the specific proposal as well.
>
>
> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>> user-facing or cross-cutting changes,
[
https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560241#comment-15560241
]
Cody Koeninger commented on SPARK-17147:
[~graphex] My WIP is at https://github.com/koeninger
That's awesome Sean, very clear.
One minor thing, noncommiters can't change assigned field as far as I know.
On Oct 9, 2016 3:40 AM, "Sean Owen" wrote:
I added a variant on this text to https://cwiki.apache.org/
[
https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15559911#comment-15559911
]
Cody Koeninger commented on SPARK-17815:
The WAL cannot be the only source of truth, because
It's not about technical design disagreement as to matters of taste,
it's about familiarity with the domain. To make an analogy, it's as
if a committer in MLlib was firmly intent on, I dunno, treating a
collection of categorical variables as if it were an ordered range of
continuous variables.
[
https://issues.apache.org/jira/browse/SPARK-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558572#comment-15558572
]
Cody Koeninger commented on SPARK-17147:
I talked with Sean in person about this, and think
, Reynold Xin <r...@databricks.com> wrote:
> I think so (at least I think it is socially acceptable). Of course, use good
> judgement here :)
>
>
>
> On Sat, Oct 8, 2016 at 12:06 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> So to be clear, can I go c
[
https://issues.apache.org/jira/browse/SPARK-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558557#comment-15558557
]
Cody Koeninger commented on SPARK-4960:
---
Is this idea pretty much dead at this point? It seems like
[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cody Koeninger resolved SPARK-3146.
---
Resolution: Fixed
Fix Version/s: 1.3.0
> Improve the flexibility of Spark Stream
[
https://issues.apache.org/jira/browse/SPARK-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558550#comment-15558550
]
Cody Koeninger commented on SPARK-3146:
---
SPARK-4964 / the direct stream added a messageHandler
So to be clear, can I go clean up the Kafka cruft?
On Sat, Oct 8, 2016 at 1:41 PM, Reynold Xin wrote:
>
> On Sat, Oct 8, 2016 at 2:09 AM, Sean Owen wrote:
>>
>>
>> - Resolve as Fixed if there's a change you can point to that resolved the
>> issue
>> - If
[
https://issues.apache.org/jira/browse/SPARK-17837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cody Koeninger updated SPARK-17837:
---
Summary: Disaster recovery of offsets from WAL (was: Disaster recover of
offsets from WAL
Cody Koeninger created SPARK-17837:
--
Summary: Disaster recover of offsets from WAL
Key: SPARK-17837
URL: https://issues.apache.org/jira/browse/SPARK-17837
Project: Spark
Issue Type: Sub
[
https://issues.apache.org/jira/browse/SPARK-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558528#comment-15558528
]
Cody Koeninger commented on SPARK-17815:
So if you start committing offsets to kafka
[
https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558506#comment-15558506
]
Cody Koeninger commented on SPARK-17812:
So I'm willing to do this work, mostly because I've
[
https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15558474#comment-15558474
]
Cody Koeninger commented on SPARK-17344:
I think this is premature until you have a fully
D to 5 days and use
> this as a regular way to bring contributions to the attention of committers.
>
> I dunno if people think this is perhaps too complex, but at our scale I
> feel we need some kind of loose but automated system for funneling
> contributions through some kind of lifecyc
That makes sense, thanks.
One thing I've never been clear on is who should be allowed to resolve
Jiras. Can I go clean up the backlog of Kafka Jiras that weren't created
by me?
If there's an informal policy here, can we update the wiki to reflect it?
Maybe it's there already, but I didn't see
of loose but automated system for funneling contributions
> through some kind of lifecycle. The status quo is just not that good (e.g.
> 474 open PRs against Spark as of this moment).
>
> Nick
>
>
> On Fri, Oct 7, 2016 at 4:48 PM Cody Koeninger <c...@koeninger.org> wrote:
&g
ssing features, slow reviews
> which is understandable to some extent... it is not only about Spark but
> things can be improved for sure for this project in particular as already
> stated.
>
> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
&
Matei asked:
> I agree about empowering people interested here to contribute, but I'm
> wondering, do you think there are technical things that people don't want to
> work on, or is it a matter of what there's been time to do?
It's a matter of mismanagement and miscommunication.
The
Without a hell of a lot more work, Assign would be the only strategy usable.
On Fri, Oct 7, 2016 at 3:25 PM, Michael Armbrust wrote:
>> The implementation is totally and completely different however, in ways
>> that leak to the end user.
>
>
> Can you elaborate?
0.10 consumers won't work on an earlier broker.
Earlier consumers will (should?) work on a 0.10 broker.
The main things earlier consumers lack from a user perspective is
support for SSL, and pre-fetching messages. The implementation is
totally and completely different however, in ways that leak
+1 to adding an SIP label and linking it from the website. I think it needs
- template that focuses it towards soliciting user goals / non goals
- clear resolution as to which strategy was chosen to pursue. I'd
recommend a vote.
Matei asked me to clarify what I meant by changing interfaces, I
Sean, that was very eloquently put, and I 100% agree. If I ever meet
you in person, I'll buy you multiple rounds of beverages of your
choice ;)
This is probably reiterating some of what you said in a less clear
manner, but I'll throw more of my 2 cents in.
- Design.
Yes, design by committee
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554160#comment-15554160
]
Cody Koeninger commented on SPARK-15406:
I think if you're already gravitating towards json
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554139#comment-15554139
]
Cody Koeninger commented on SPARK-15406:
When something has gone wrong, as an end user, how do I
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553955#comment-15553955
]
Cody Koeninger edited comment on SPARK-15406 at 10/7/16 3:20 AM:
-
As soon
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553955#comment-15553955
]
Cody Koeninger commented on SPARK-15406:
As soon as you say "checkpoint", a whole class
I love Spark. 3 or 4 years ago it was the first distributed computing
environment that felt usable, and the community was welcoming.
But I just got back from the Reactive Summit, and this is what I observed:
- Industry leaders on stage making fun of Spark's streaming model
- Open source project
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553476#comment-15553476
]
Cody Koeninger commented on SPARK-15406:
You cannot have reliable delivery semantics
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553176#comment-15553176
]
Cody Koeninger commented on SPARK-15406:
I'm talking about people using Spark, not committers
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553099#comment-15553099
]
Cody Koeninger commented on SPARK-15406:
To be real blunt, my level of interest is pretty low
[
https://issues.apache.org/jira/browse/KAFKA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553085#comment-15553085
]
Cody Koeninger commented on KAFKA-3370:
---
Throwing an exception should be the default IMHO. Silent
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550262#comment-15550262
]
Cody Koeninger commented on SPARK-15406:
See the PR in the linked subtask
https
Totally agree that specifying the schema manually should be the
baseline. LGTM, thanks for working on it. Seems like it looks good
to others too judging by the comment on the PR that it's getting
merged to master :)
On Thu, Sep 29, 2016 at 2:13 PM, Michael Armbrust
gt; Spark standalone is not Yarn… or secure for that matter… ;-)
>
>> On Sep 29, 2016, at 11:18 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Spark streaming helps with aggregation because
>>
>> A. raw kafka consumers have no built in framework
gt; Spark standalone is not Yarn… or secure for that matter… ;-)
>
>> On Sep 29, 2016, at 11:18 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Spark streaming helps with aggregation because
>>
>> A. raw kafka consumers have no built in framework
t, query the
>> respective table in Cassandra / Postgres. (select .. from data where user =
>> ? and date between and and some_field = ?)
>>
>> How will Spark Streaming help w/ aggregation? Couldn't the data be queried
>> from Cassandra / Postgres via the Kafka cons
t, query the
>> respective table in Cassandra / Postgres. (select .. from data where user =
>> ? and date between and and some_field = ?)
>>
>> How will Spark Streaming help w/ aggregation? Couldn't the data be queried
>> from Cassandra / Postgres via the Kafka cons
Spark streaming helps with aggregation because
A. raw kafka consumers have no built in framework for shuffling
amongst nodes, short of writing into an intermediate topic (I'm not
touching Kafka Streams here, I don't have experience), and
B. it deals with batches, so you can transactionally
Spark streaming helps with aggregation because
A. raw kafka consumers have no built in framework for shuffling
amongst nodes, short of writing into an intermediate topic (I'm not
touching Kafka Streams here, I don't have experience), and
B. it deals with batches, so you can transactionally
a should not be lost. The system should be as fault tolerant as
>> possible.
>>
>> What's the advantage of using Spark for reading Kafka instead of direct
>> Kafka consumers?
>>
>> On Thu, Sep 29, 2016 at 8:28 PM, Cody Koeninger <c...@koeninger.org>
>> wrote
a should not be lost. The system should be as fault tolerant as
>> possible.
>>
>> What's the advantage of using Spark for reading Kafka instead of direct
>> Kafka consumers?
>>
>> On Thu, Sep 29, 2016 at 8:28 PM, Cody Koeninger <c...@koeninger.org>
>> wrote
otherwise updates will be
> idempotent but not inserts.
>
> Data should not be lost. The system should be as fault tolerant as possible.
>
> What's the advantage of using Spark for reading Kafka instead of direct
> Kafka consumers?
>
> On Thu, Sep 29, 2016 at 8:28 PM, Cody Koeninger &l
otherwise updates will be
> idempotent but not inserts.
>
> Data should not be lost. The system should be as fault tolerant as possible.
>
> What's the advantage of using Spark for reading Kafka instead of direct
> Kafka consumers?
>
> On Thu, Sep 29, 2016 at 8:28 PM, Cody Koeninger &l
Spark direct stream is just fine for this use case.
> But why postgres and not cassandra?
> Is there anything specific here that i may not be aware?
>
> Thanks
> Deepak
>
> On Thu, Sep 29, 2016 at 8:41 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> How are y
Spark direct stream is just fine for this use case.
> But why postgres and not cassandra?
> Is there anything specific here that i may not be aware?
>
> Thanks
> Deepak
>
> On Thu, Sep 29, 2016 at 8:41 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> How are y
How are you going to handle etl failures? Do you care about lost /
duplicated data? Are your writes idempotent?
Absent any other information about the problem, I'd stay away from
cassandra/flume/hdfs/hbase/whatever, and use a spark direct stream
feeding postgres.
On Thu, Sep 29, 2016 at 10:04
How are you going to handle etl failures? Do you care about lost /
duplicated data? Are your writes idempotent?
Absent any other information about the problem, I'd stay away from
cassandra/flume/hdfs/hbase/whatever, and use a spark direct stream
feeding postgres.
On Thu, Sep 29, 2016 at 10:04
Will this be able to handle projection pushdown if a given job doesn't
utilize all the columns in the schema? Or should people have a
per-job schema?
On Wed, Sep 28, 2016 at 2:17 PM, Michael Armbrust
wrote:
> Burak, you can configure what happens with corrupt records for
Regarding documentation debt, is there a reason not to deploy
documentation updates more frequently than releases? I recall this
used to be the case.
On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley wrote:
> +1 for 4 months. With QA taking about a month, that's very
uest-7,
> sapxm.adserving.log.ad_request-9] for group spark_aggregation_job-kafka010
> 16/09/28 08:16:48 INFO ConsumerCoordinator: Revoking previously assigned
> partitions [sapxm.adserving.log.view-3, sapxm.adserving.log.view-4,
> sapxm.adserving.log.view-1, sapxm.adserving.log.view-2,
>
What's the actual stacktrace / exception you're getting related to
commit failure?
On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff
wrote:
> Hi everybody,
>
> i am using the new Kafka Receiver for Spark Streaming for my Job. When
> running with old consumer it
Do you have a minimal example of how to reproduce the problem, that
doesn't depend on Cassandra?
On Mon, Sep 26, 2016 at 4:10 PM, Erwan ALLAIN wrote:
> Hi
>
> I'm working with
> - Kafka 0.8.2
> - Spark Streaming (2.0) direct input stream.
> - cassandra 3.0
>
> My batch
Either artifact should work with 0.10 brokers. The 0.10 integration has
more features but is still marked experimental.
On Sep 26, 2016 3:41 AM, "Haopu Wang" wrote:
> Hi, in the official integration guide, it says "Spark Streaming 2.0.0 is
> compatible with Kafka 0.8.2.1."
Spark alone isn't going to solve this problem, because you have no reliable
way of making sure a given worker has a consistent shard of the messages
seen so far, especially if there's an arbitrary amount of delay between
duplicate messages. You need a DHT or something equivalent.
On Sep 24, 2016
ct Approach (No
>> receivers) in Spark streaming.
>>
>> I am not sure if I have that leverage to upgrade at this point, but do you
>> know if Spark 1.6.1 to Spark 2.0 jump is smooth usually or does it involve
>> lot of hick-ups.
>> Also is there a migratio
Do you have the ability to try using Spark 2.0 with the
streaming-kafka-0-10 connector?
I'd expect the 1.6.1 version to be compatible with kafka 0.10, but it
would be good to rule that out.
On Thu, Sep 22, 2016 at 1:37 PM, sagarcasual . wrote:
> Hello,
>
> I am trying to
-dev +user
Than warning pretty much means what it says - the consumer tried to
get records for the given partition / offset, and couldn't do so after
polling the kafka broker for X amount of time.
If that only happens when you put additional load on Kafka via
producing, the first thing I'd do is
[
https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15499334#comment-15499334
]
Cody Koeninger commented on SPARK-17510:
[~jnadler] See the pull request here https://github.com
reduceByKey. Does this work on _all_ elements
> in the stream, as they're coming in, or is this a transformation per
> RDD/micro batch? I assume the latter, otherwise it would be more akin to
> updateStateByKey, right?
>
> On Tue, Sep 13, 2016 at 4:42 PM, Cody Koeninger
[
https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491455#comment-15491455
]
Cody Koeninger commented on SPARK-17510:
Ok, next time I get some free hacking time I can make
[
https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491389#comment-15491389
]
Cody Koeninger commented on SPARK-17510:
Just for clarity's sake, compute time is far higher
[
https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491087#comment-15491087
]
Cody Koeninger edited comment on SPARK-17510 at 9/14/16 6:12 PM:
-
I use
[
https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491087#comment-15491087
]
Cody Koeninger commented on SPARK-17510:
I use direct stream for multiple topic jobs where
for the details,
> much appreciated.
>
> http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/
>
> On Tue, Sep 13, 2016 at 8:14 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> 1. see
>> http://spark.apache.org/docs/lat
That version of createDirectStream doesn't handle partition changes.
You can work around it by starting the job again.
The spark 2.0 consumer for kafka 0.10 should handle partition changes
via SubscribePattern.
On Tue, Sep 13, 2016 at 7:13 PM, vinay gupta
wrote:
>
1. see
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
look for HasOffsetRange. If you really want the info per-message
rather than per-partition, createRDD has an overload that takes a
messageHandler from MessageAndMetadata to
[
https://issues.apache.org/jira/browse/SPARK-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489102#comment-15489102
]
Cody Koeninger commented on SPARK-17510:
This would require a constructor change and another
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488642#comment-15488642
]
Cody Koeninger commented on SPARK-15406:
1. How can we avoid duplicate work like
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488558#comment-15488558
]
Cody Koeninger commented on SPARK-15406:
So you're saying the type of K and V will always
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488452#comment-15488452
]
Cody Koeninger edited comment on SPARK-15406 at 9/13/16 9:18 PM:
-
So I
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488452#comment-15488452
]
Cody Koeninger commented on SPARK-15406:
So if I asked this twice with no answer, so I'll ask
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488312#comment-15488312
]
Cody Koeninger edited comment on SPARK-15406 at 9/13/16 8:26 PM:
-
Unless
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488312#comment-15488312
]
Cody Koeninger commented on SPARK-15406:
Unless I'm misunderstanding, you answered regarding
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487930#comment-15487930
]
Cody Koeninger commented on SPARK-15406:
Specific examples:
Kafka has a type for a key
han 1 partition based on size? Or is it something that the "source"
> (SocketStream, KafkaStream etc.) decides?
>
> On Tue, Sep 13, 2016 at 4:26 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> A micro batch is an RDD.
>>
>> An RDD has partitions, s
A micro batch is an RDD.
An RDD has partitions, so different executors can work on different
partitions concurrently.
Don't think of that as multiple micro-batches within a time slot.
It's one RDD within a time slot, with multiple partitions.
On Tue, Sep 13, 2016 at 9:01 AM, Daan Debie
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15487310#comment-15487310
]
Cody Koeninger commented on SPARK-15406:
So we can do the easiest thing possible for 2.0.1, which
[
https://issues.apache.org/jira/browse/SPARK-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486131#comment-15486131
]
Cody Koeninger commented on SPARK-15406:
I've got a minimal working Source and SourceProvider
Are you using the receiver based stream?
On Sep 10, 2016 15:45, "Eric Ho" wrote:
> I notice that some Spark programs would contact something like 'zoo1:2181'
> when trying to suck data out of Kafka.
>
> Does the kafka data actually transported out over this port ?
>
>
401 - 500 of 1347 matches
Mail list logo