On Thu, Oct 31, 2019 at 4:30 PM Sean Owen wrote:
>
> . But it'd be cooler to call these major
> releases!
Maybe this is just semantics, but my point is the Scala project
already does call 2.12 to 2.13 a major release
e.g. from https://www.scala-lang.org/download/
"Note that different *major*
On Wed, Oct 30, 2019 at 5:57 PM Sean Owen wrote:
> Or, frankly, maybe Scala should reconsider the mutual incompatibility
> between minor releases. These are basically major releases, and
> indeed, it causes exactly this kind of headache.
>
Not saying binary incompatibility is fun, but 2.12 to
To be more explicit, the easiest thing to do in the short term is use
your own instance of KafkaConsumer to get the offsets for the
timestamps you're interested in, using offsetsForTimes, and use those
for the start / end offsets. See
I feel like I've already said my piece on
https://github.com/apache/spark/pull/22138 let me know if you have
more questions.
As for SS in general, I don't have a production SS deployment, so I'm
less comfortable with reviewing large changes to it. But if no other
committers are working on it...
22, 2018 at 7:32 PM Matei Zaharia wrote:
>
> Can we start by just recommending to contributors that they do this manually?
> Then if it seems to work fine, we can try to automate it.
>
> > On Nov 22, 2018, at 4:40 PM, Cody Koeninger wrote:
> >
> > I believe scalaf
hu, Nov 22, 2018 at 9:11 AM Cody Koeninger wrote:
>>
>> Plugin invocation is ./build/mvn mvn-scalafmt_2.12:format
>>
>> It takes about 5 seconds, and errors out on the first different file
>> that doesn't match formatting.
>>
>> I made a shell
worth a shot. What's the invocation that Shane
> could add (after this change goes in)
> On Wed, Nov 21, 2018 at 3:27 PM Cody Koeninger wrote:
> >
> > There's a mvn plugin (sbt as well, but it requires sbt 1.0+) so it
> > should be runnable from the PR builder
> >
> >
trokes but not in the details.
> Is this something that can be just run in the PR builder? if the rules
> are simple and not too hard to maintain, seems like a win.
> On Wed, Nov 21, 2018 at 2:26 PM Cody Koeninger wrote:
> >
> > Definitely not suggesting a mass reformat, just on a per
gt;
> Is there a way to just check style on PR changes? that's fine.
> On Wed, Nov 21, 2018 at 11:40 AM Cody Koeninger wrote:
> >
> > Is there any appetite for revisiting automating formatting?
> >
> > I know over the years various people have expressed opposition to
Is there any appetite for revisiting automating formatting?
I know over the years various people have expressed opposition to it
as unnecessary churn in diffs, but having every new contributor
greeted with "nit: 4 space indentation for argument lists" isn't very
welcoming.
Anastasios it looks like you already identified the two lines that
need to change, the string interpolation that depends on
UUID.randomUUID and metadataPath.hashCode.
I'd factor that out into a function that returns the group id. That
function would also need to take the "parameters" variable
Am I the only one for whom the livestream link didn't work last time?
Would like to be able to at least watch the discussion this time
around.
On Tue, Nov 13, 2018 at 6:01 PM Ryan Blue wrote:
>
> Hi everyone,
> I just wanted to send out a reminder that there’s a DSv2 sync tomorrow at
> 17:00
That sounds reasonable to me
On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias wrote:
>
> Hi all,
>
> I run in the following situation with Spark Structure Streaming (SS) using
> Kafka.
>
> In a project that I work on, there is already a secured Kafka setup where ops
> can issue an SSL
Just got a question about this on the user list as well.
Worth removing that link to pwendell's directory from the docs?
On Sun, Jan 21, 2018 at 12:13 PM, Jacek Laskowski wrote:
> Hi,
>
> http://spark.apache.org/developer-tools.html#nightly-builds reads:
>
>> Spark nightly packages are
+1 to Sean's comment
On Fri, Aug 31, 2018 at 2:48 PM, Reynold Xin wrote:
> Yup all good points. One way I've done it in the past is to have an appendix
> section for design sketch, as an expansion to the question "- What is new in
> your approach and why do you think it will be successful?"
>
>
Short answer is it isn't necessary.
Long answer is that you aren't just changing from 08 to 10, you're
changing from the receiver based implementation to the direct stream.
Read these:
https://github.com/koeninger/kafka-exactly-once
According to
http://spark.apache.org/improvement-proposals.html
the shepherd should be a PMC member, not necessarily the person who
proposed the SPIP
On Tue, Jul 17, 2018 at 9:13 AM, Wenchen Fan wrote:
> I don't know an official answer, but conventionally people who propose the
> SPIP would
Sounds good, I'd like to add SPARK-24067 today assuming there's no objections
On Thu, May 10, 2018 at 1:22 PM, Henry Robinson wrote:
> +1, I'd like to get a release out with SPARK-23852 fixed. The Parquet
> community are about to release 1.8.3 - the voting period closes
https://issues.apache.org/jira/browse/SPARK-24067
is asking to backport a change to the 2.3 branch.
My questions
- In general are there any concerns about what qualifies for backporting?
This adds a configuration variable but shouldn't change default behavior.
- Is a separate jira + pr
Congrats!
On Mon, Apr 2, 2018 at 12:28 AM, Wenchen Fan wrote:
> Hi all,
>
> The Spark PMC recently added Zhenhua Wang as a committer on the project.
> Zhenhua is the major contributor of the CBO project, and has been
> contributing across several areas of Spark for a while,
tions to Spark 2.3 and other past work:
>
> - Anirudh Ramanathan (contributor to Kubernetes support)
> - Bryan Cutler (contributor to PySpark and Arrow support)
> - Cody Koeninger (contributor to streaming and Kafka support)
> - Erik Erlandson (contributor to Kubernetes support)
> - M
Was there any answer to my question around the effect of changes to
the sink api regarding access to underlying offsets?
On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin wrote:
> Most of those should be answered by the attached design sketch in the JIRA
> ticket.
>
> On Wed, Nov
down.
>
>
> On 16-Oct-2017 7:34 PM, "Cody Koeninger" <c...@koeninger.org> wrote:
>>
>> Have you tried adjusting the timeout?
>>
>> On Mon, Oct 16, 2017 at 8:08 AM, Suprith T Jain <t.supr...@gmail.com>
>> wrote:
>> >
Have you tried adjusting the timeout?
On Mon, Oct 16, 2017 at 8:08 AM, Suprith T Jain wrote:
> Hi guys,
>
> I have a 3 node cluster and i am running a spark streaming job. consider the
> below example
>
> /*spark-submit* --master yarn-cluster --class
>
https://issues-test.apache.org/jira/browse/SPARK-18258
On Mon, Sep 11, 2017 at 7:15 AM, Dmitry Naumenko wrote:
> Hi all,
>
> It started as a discussion in
> https://stackoverflow.com/questions/46153105/how-to-get-kafka-offsets-with-spark-structured-streaming-api.
>
> So
rimental one.
>
> Is the Kafka 0.10 integration as stable as it is going to be, and worth
> marking as such for 2.3.0?
>
>
> On Tue, Sep 5, 2017 at 4:12 PM Cody Koeninger <c...@koeninger.org> wrote:
>>
>> +1 to going ahead and giving a deprecation warning now
>
+1 to going ahead and giving a deprecation warning now
On Tue, Sep 5, 2017 at 6:39 AM, Sean Owen wrote:
> On the road to Scala 2.12, we'll need to make Kafka 0.8 support optional in
> the build, because it is not available for Scala 2.12.
>
>
Here's the jira for upgrading to a 0.10.x point release, which is
effectively the discussion of upgrading to 0.11 now
https://issues.apache.org/jira/browse/SPARK-18057
On Tue, Sep 5, 2017 at 1:27 AM, matus.cimerman wrote:
> Hi guys,
>
> is there any plans to support
Just wanted to point out that because the jira isn't labeled SPIP, it
won't have shown up linked from
http://spark.apache.org/improvement-proposals.html
On Mon, Aug 28, 2017 at 2:20 PM, Wenchen Fan wrote:
> Hi all,
>
> It has been almost 2 weeks since I proposed the data
Can you explain in more detail what you mean by "distribute Kafka
topics among different instances of same consumer group"?
If you're trying to run multiple streams using the same consumer
group, it's already documented that you shouldn't do that.
On Thu, Jun 8, 2017 at 12:43 AM, Rastogi, Pankaj
omething like.
>>
>> df.writeStream.format("kafka").start("topic")
>>
>> Seems reasonable if people don't think that is confusing.
>>
>> On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>>
>>> I'm c
There are existing tickets on the issues around kafka versions, e.g.
https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten
any committer weigh-in on direction.
On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori wrote:
> Guys,
>
> To change the subject from
pen ticket with the SPIP label show it should show up
On Fri, Mar 10, 2017 at 11:19 AM, Reynold Xin <r...@databricks.com> wrote:
> We can just start using spip label and link to it.
>
>
>
> On Fri, Mar 10, 2017 at 9:18 AM, Cody Koeninger <c...@koeninger.org> wrote:
the admins
> can make a new issue type unfortunately. We may just have to mention a
> convention involving title and label or something.
>
> On Fri, Mar 10, 2017 at 4:52 PM Cody Koeninger <c...@koeninger.org> wrote:
>>
>> I think it ought to be its own page, linked from t
I think it ought to be its own page, linked from the more / community
menu dropdowns.
We also need the jira tag, and for the page to clearly link to filters
that show proposed / completed SPIPs
On Fri, Mar 10, 2017 at 3:39 AM, Sean Owen wrote:
> Alrighty, if nobody is
code/doc
> change we can just review and merge as usual.
>
> On Tue, Mar 7, 2017 at 3:15 PM Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Another week, another ping. Anyone on the PMC willing to call a vote on
>> this?
it to a vote and revisit the proposal in a few
>> months.
>> Joseph
>>
>> On Fri, Feb 24, 2017 at 5:35 AM, Cody Koeninger <c...@koeninger.org>
>> wrote:
>>
>>> It's been a week since any further discussion.
>>>
>>> Do PMC members
gt;
>> wrote:
>>>
>>> The doc looks good to me.
>>>
>>> Ryan, the role of the shepherd is to make sure that someone
>>> knowledgeable with Spark processes is involved: this person can advise
>>> on technical and procedural considerati
It's just a process
> document.
>
> Still, a fine step IMHO.
>
> On Thu, Feb 16, 2017 at 4:22 PM Reynold Xin <r...@databricks.com> wrote:
>>
>> Updated. Any feedback from other community members?
>>
>>
>> On Wed, Feb 15, 2017 at 2:53 AM, Cody Koenin
ustomers. The to-do feature list was always above 100. Sometimes, the
>> customers are feeling frustrated when we are unable to deliver them on time
>> due to the resource limits and others. Even if they paid us billions, we
>> still need to do it phase by phase or somet
ccepted or rejected, so that we do not end
>> up with a distracting long tail of half-hearted proposals.
>>
>> These rules are meant to be flexible, but the current document should be
>> clear about who is in charge of a SPIP, and the state it is currently in.
>>
>> We h
Congrats, glad to hear it
On Jan 24, 2017 12:47 PM, "Shixiong(Ryan) Zhu"
wrote:
> Congrats Burak & Holden!
>
> On Tue, Jan 24, 2017 at 10:39 AM, Joseph Bradley
> wrote:
>
>> Congratulations Burak & Holden!
>>
>> On Tue, Jan 24, 2017 at 10:33 AM,
Totally agree with most of what Sean said, just wanted to give an
alternate take on the "maintainers" thing
On Tue, Jan 24, 2017 at 10:23 AM, Sean Owen wrote:
> There is no such list because there's no formal notion of ownership or
> access to subsets of the project. Tracking
g some items mentioned above + a new one
>> w.r.t. Reynold's draft
>> <https://docs.google.com/document/d/1-Zdi_W-wtuxS9hTK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit#>
>> :
>> * Reinstate the "Where" section with links to current and past SIPs
>> * Add field for statin
Agree that frequent topic deletion is not a very Kafka-esque thing to do
On Fri, Dec 9, 2016 at 12:09 PM, Shixiong(Ryan) Zhu
wrote:
> Sean, "stress test for failOnDataLoss=false" is because Kafka consumer may
> be thrown NPE when a topic is deleted. I added some logic to
If you want finer-grained max rate setting, SPARK-17510 got merged a
while ago. There's also SPARK-18580 which might help address the
issue of starting backpressure rate for the first batch.
On Mon, Dec 5, 2016 at 4:18 PM, Liren Ding wrote:
> Hey all,
>
> Does
tion says : Input streams that can generate
> RDDs from new data by running a service/thread only on the driver node (that
> is, without running a receiver on worker nodes)
>
> Thanks and regards,
> Aakash Pradeep
>
>
> On Tue, Nov 15, 2016 at 2:55 PM, Cody Koeninger <
It'd probably be worth no longer marking the 0.8 interface as
experimental. I don't think it's likely to be subject to active
development at this point.
You can use the 0.8 artifact to consume from a 0.9 broker
Where are you reading documentation indicating that the direct stream
only runs on
t;m...@apache.org> wrote:
> I think they are open to others helping, in fact, more than one person has
> worked on the JIRA so far. And, it's been crawling really slowly and that's
> preventing adoption of Spark's new connector in secure Kafka environments.
>
> On Tue, Nov 8, 2016 at 7:59
Have you asked the assignee on the Kafka jira whether they'd be
willing to accept help on it?
On Tue, Nov 8, 2016 at 5:26 PM, Mark Grover wrote:
> Hi all,
> We currently have a new direct stream connector, thanks to work by Cody and
> others on SPARK-12177.
>
> However, that
gt; Oops. Let me try figure that out.
>>
>>
>> On Monday, November 7, 2016, Cody Koeninger <c...@koeninger.org> wrote:
>>>
>>> Thanks for picking up on this.
>>>
>>> Maybe I fail at google docs, but I can't see any edits on the documen
nice) would also be nice,
>>> but that can be done at any time.
>>>
>>> BTW, shameless plug: I filed SPARK-18085 which I consider a candidate
>>> for a SIP, given the scope of the work. The document attached even
>>> somewhat matches the proposed format. So
SPARK-17510
https://github.com/apache/spark/pull/15132
It's for allowing tweaking of rate limiting on a per-partition basis
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
I answered the duplicate post on the user mailing list, I'd say keep
the discussion there.
On Fri, Nov 4, 2016 at 12:14 PM, vonnagy wrote:
> Nitin,
>
> I am getting the similar issues using Spark 2.0.1 and Kafka 0.10. I have to
> jobs, one that uses a Kafka stream and one that
So concrete things people could do
- users could tag subject lines appropriately to the component they're
asking about
- contributors could monitor user@ for tags relating to components
they've worked on.
I'd be surprised if my miss rate for any mailing list questions
well-labeled as Kafka was
Makes sense to me.
I do wonder if e.g.
[SPARK-12345][STRUCTUREDSTREAMING][KAFKA]
is going to leave any room in the Github PR form for actual title content?
On Mon, Oct 31, 2016 at 1:37 PM, Michael Armbrust
wrote:
> I'm planning to do a little maintenance on JIRA to
h them, my mail was just to show some
> aspects from my side, so from theside of developer and person who is trying
> to help others with Spark (via StackOverflow or other ways)
>
>
> Pozdrawiam / Best regards,
>
> Tomasz
>
>
>
> Od: Cody K
I think only supporting 1 version of scala at any given time is not
sufficient, 2 probably is ok.
I.e. don't drop 2.10 before 2.12 is out + supported
On Tue, Oct 25, 2016 at 10:56 AM, Sean Owen wrote:
> The general forces are that new versions of things to support emerge,
think that makes sense, I can start a
ticket.
On Thu, Oct 20, 2016 at 1:16 PM, Reynold Xin <r...@databricks.com> wrote:
> Seems like a good new API to add?
>
>
> On Thu, Oct 20, 2016 at 11:14 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Access to the par
Access to the partition ID is necessary for basically every single one
of my jobs, and there isn't a foreachPartiionWithIndex equivalent.
You can kind of work around it with empty foreach after the map, but
it's really awkward to explain to people.
On Thu, Oct 20, 2016 at 12:52 PM, Reynold Xin
ncy ?
>>
> I think that the fact that they serve as an output trigger is a problem,
> but Structured Streaming seems to resolve this now.
>
>>
>> Thanks
>> Shivaram
>>
>> On Wed, Oct 19, 2016 at 1:29 PM, Michael Armbrust
>> <mich...@databricks
Is anyone seriously thinking about alternatives to microbatches?
On Wed, Oct 19, 2016 at 2:45 PM, Michael Armbrust
wrote:
> Anything that is actively being designed should be in JIRA, and it seems
> like you found most of it. In general, release windows can be found on
+1 to putting docs in one clear place.
On Oct 18, 2016 6:40 AM, "Sean Owen" wrote:
> I'm OK with that. The upside to the wiki is that it can be edited directly
> outside of a release cycle. However, in practice I find that the wiki is
> rarely changed. To me it also serves
SPARK-17841 three line bugfix that has a week old PR
SPARK-17812 being able to specify starting offsets is a must have for
a Kafka mvp in my opinion, already has a PR
SPARK-17813 I can put in a PR for this tonight if it'll be considered
On Mon, Oct 17, 2016 at 12:28 AM, Reynold Xin
P. However I think that Spark should
>> have real-time streaming support. Currently I see many posts/comments
>> that "Spark has too big latency". Spark Streaming is doing very good
>> jobs with micro-batches, however I think it is possible to add also more
>
I've always been confused as to why it would ever be a good idea to
put any streaming query system on the critical path for synchronous <
100msec requests. It seems to make a lot more sense to have a
streaming system do asynch updates of a store that has better latency
and quality of service
ul, because we have run into some trouble in the past
> with some inside the ASF but essentially outside the Spark community who
> didn't like the way we were doing things.
>
> On Mon, Oct 10, 2016 at 3:53 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Apache documents s
ng lots of rules
>> from the beginning makes it confusing and can reduce contributions.
>> Although, as engineers, we believe that anything can be solved using
>> mechanical rules, in practice software development is a social process that
>> ultimately requires humans to tackle
community effort and I wouldn't want to move forward if up to half of the
> community thinks it's an untenable idea.
>
> rb
>
> On Mon, Oct 10, 2016 at 12:07 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> I think this is closer to a procedural issue than a code mod
the value of codifying that in our process? I
>> think restricting who can submit proposals would only undermine them by
>> pushing contributors out. Maybe I'm missing something here?
>>
>> rb
>>
>>
>>
>> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c.
o can submit proposals would only undermine them by
> pushing contributors out. Maybe I'm missing something here?
>
> rb
>
>
>
> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> Yes, users suggesting SIPs is a good thing and is expl
rategies have a large effect on the goal, we should
> have it discussed when discussing the goals. In addition, while it is often
> easy to throw out completely infeasible goals, it is often much harder to
> figure out that the goals are unfeasible without fine tuning.
>
>
>
>
>
o I'd really like a culture of having those early.
> People don't argue about prettiness when they discuss APIs, they argue about
> the core concepts to expose in order to meet various goals, and then they're
> stuck maintaining those for a long time.
>
> Matei
>
> On Oct 9,
proposal is technically feasible
>> right now? If it's infeasible, that will be discovered later during design
>> and implementation. Same thing with rejected strategies -- listing some of
>> those is definitely useful sometimes, but if you make this a *required*
>> se
iscovered later
> during design and implementation. Same thing with rejected strategies --
> listing some of those is definitely useful sometimes, but if you make this
> a *required* section, people are just going to fill it in with bogus stuff
> (I've seen this happen before).
>
&g
Regarding name, if the SIP overlap is a concern, we can pick a different name.
My tongue in cheek suggestion would be
Spark Lightweight Improvement process (SPARKLI)
On Sun, Oct 9, 2016 at 4:14 PM, Cody Koeninger <c...@koeninger.org> wrote:
> So to focus the discussion on the specific
ures more visible (and their approval more formal)?
>
> BTW note that in either case, I'd like to have a template for design docs
> too, which should also include goals. I think that would've avoided some of
> the issues you brought up.
>
> Matei
>
> On Oct 9, 2016, at 10
At a super high level, it depends on whether you want the SIPs to be
>>> PRDs for getting some quick feedback on the goals of a feature before it is
>>> designed, or something more like full-fledged design docs (just a more
>>> visible design doc for bigger changes). I loo
n what this
> entails, and then we can discuss this the specific proposal as well.
>
>
> On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Yeah, in case it wasn't clear, I was talking about SIPs for major
>> user-facing or cross-cutting changes,
That's awesome Sean, very clear.
One minor thing, noncommiters can't change assigned field as far as I know.
On Oct 9, 2016 3:40 AM, "Sean Owen" wrote:
I added a variant on this text to https://cwiki.apache.org/
It's not about technical design disagreement as to matters of taste,
it's about familiarity with the domain. To make an analogy, it's as
if a committer in MLlib was firmly intent on, I dunno, treating a
collection of categorical variables as if it were an ordered range of
continuous variables.
, Reynold Xin <r...@databricks.com> wrote:
> I think so (at least I think it is socially acceptable). Of course, use good
> judgement here :)
>
>
>
> On Sat, Oct 8, 2016 at 12:06 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>> So to be clear, can I go c
So to be clear, can I go clean up the Kafka cruft?
On Sat, Oct 8, 2016 at 1:41 PM, Reynold Xin wrote:
>
> On Sat, Oct 8, 2016 at 2:09 AM, Sean Owen wrote:
>>
>>
>> - Resolve as Fixed if there's a change you can point to that resolved the
>> issue
>> - If
D to 5 days and use
> this as a regular way to bring contributions to the attention of committers.
>
> I dunno if people think this is perhaps too complex, but at our scale I
> feel we need some kind of loose but automated system for funneling
> contributions through some kind of lifecyc
That makes sense, thanks.
One thing I've never been clear on is who should be allowed to resolve
Jiras. Can I go clean up the backlog of Kafka Jiras that weren't created
by me?
If there's an informal policy here, can we update the wiki to reflect it?
Maybe it's there already, but I didn't see
of loose but automated system for funneling contributions
> through some kind of lifecycle. The status quo is just not that good (e.g.
> 474 open PRs against Spark as of this moment).
>
> Nick
>
>
> On Fri, Oct 7, 2016 at 4:48 PM Cody Koeninger <c...@koeninger.org> wrote:
&g
ssing features, slow reviews
> which is understandable to some extent... it is not only about Spark but
> things can be improved for sure for this project in particular as already
> stated.
>
> On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
&
Matei asked:
> I agree about empowering people interested here to contribute, but I'm
> wondering, do you think there are technical things that people don't want to
> work on, or is it a matter of what there's been time to do?
It's a matter of mismanagement and miscommunication.
The
Without a hell of a lot more work, Assign would be the only strategy usable.
On Fri, Oct 7, 2016 at 3:25 PM, Michael Armbrust wrote:
>> The implementation is totally and completely different however, in ways
>> that leak to the end user.
>
>
> Can you elaborate?
0.10 consumers won't work on an earlier broker.
Earlier consumers will (should?) work on a 0.10 broker.
The main things earlier consumers lack from a user perspective is
support for SSL, and pre-fetching messages. The implementation is
totally and completely different however, in ways that leak
+1 to adding an SIP label and linking it from the website. I think it needs
- template that focuses it towards soliciting user goals / non goals
- clear resolution as to which strategy was chosen to pursue. I'd
recommend a vote.
Matei asked me to clarify what I meant by changing interfaces, I
Sean, that was very eloquently put, and I 100% agree. If I ever meet
you in person, I'll buy you multiple rounds of beverages of your
choice ;)
This is probably reiterating some of what you said in a less clear
manner, but I'll throw more of my 2 cents in.
- Design.
Yes, design by committee
I love Spark. 3 or 4 years ago it was the first distributed computing
environment that felt usable, and the community was welcoming.
But I just got back from the Reactive Summit, and this is what I observed:
- Industry leaders on stage making fun of Spark's streaming model
- Open source project
Totally agree that specifying the schema manually should be the
baseline. LGTM, thanks for working on it. Seems like it looks good
to others too judging by the comment on the PR that it's getting
merged to master :)
On Thu, Sep 29, 2016 at 2:13 PM, Michael Armbrust
Will this be able to handle projection pushdown if a given job doesn't
utilize all the columns in the schema? Or should people have a
per-job schema?
On Wed, Sep 28, 2016 at 2:17 PM, Michael Armbrust
wrote:
> Burak, you can configure what happens with corrupt records for
Regarding documentation debt, is there a reason not to deploy
documentation updates more frequently than releases? I recall this
used to be the case.
On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley wrote:
> +1 for 4 months. With QA taking about a month, that's very
To be clear, "safe" has very little to do with this.
It's pretty clear that there's very little risk of the spark module
for kinesis being considered a derivative work, much less all of
spark.
The use limitation in 3.3 that caused the amazon license to be put on
the apache X list also doesn't
I don't see a reason to remove the non-assembly artifact, why would
you? You're not distributing copies of Amazon licensed code, and the
Amazon license goes out of its way not to over-reach regarding
derivative works.
This seems pretty clearly to fall in the spirit of
The Kafka commit api isn't transactional, you aren't going to get
exactly once behavior out of it even if you were committing offsets on
a per-partition basis. This doesn't really have anything to do with
Spark; the old code you posted was already inherently broken.
Make your outputs idempotent
http://blog.originate.com/blog/2014/02/27/types-inside-types-in-scala/
On Wed, Aug 31, 2016 at 2:19 AM, Sean Owen wrote:
> Weird, I recompiled Spark with a similar change to Model and it seemed
> to work but maybe I missed a step in there.
>
> On Wed, Aug 31, 2016 at 6:33 AM,
1 - 100 of 202 matches
Mail list logo