Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger
On Thu, Oct 31, 2019 at 4:30 PM Sean Owen wrote: > > . But it'd be cooler to call these major > releases! Maybe this is just semantics, but my point is the Scala project already does call 2.12 to 2.13 a major release e.g. from https://www.scala-lang.org/download/ "Note that different *major*

Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger
On Wed, Oct 30, 2019 at 5:57 PM Sean Owen wrote: > Or, frankly, maybe Scala should reconsider the mutual incompatibility > between minor releases. These are basically major releases, and > indeed, it causes exactly this kind of headache. > Not saying binary incompatibility is fun, but 2.12 to

Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger
To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Cody Koeninger
I feel like I've already said my piece on https://github.com/apache/spark/pull/22138 let me know if you have more questions. As for SS in general, I don't have a production SS deployment, so I'm less comfortable with reviewing large changes to it. But if no other committers are working on it...

Re: Automated formatting

2018-11-26 Thread Cody Koeninger
22, 2018 at 7:32 PM Matei Zaharia wrote: > > Can we start by just recommending to contributors that they do this manually? > Then if it seems to work fine, we can try to automate it. > > > On Nov 22, 2018, at 4:40 PM, Cody Koeninger wrote: > > > > I believe scalaf

Re: Automated formatting

2018-11-22 Thread Cody Koeninger
hu, Nov 22, 2018 at 9:11 AM Cody Koeninger wrote: >> >> Plugin invocation is ./build/mvn mvn-scalafmt_2.12:format >> >> It takes about 5 seconds, and errors out on the first different file >> that doesn't match formatting. >> >> I made a shell

Re: Automated formatting

2018-11-22 Thread Cody Koeninger
worth a shot. What's the invocation that Shane > could add (after this change goes in) > On Wed, Nov 21, 2018 at 3:27 PM Cody Koeninger wrote: > > > > There's a mvn plugin (sbt as well, but it requires sbt 1.0+) so it > > should be runnable from the PR builder > > > >

Re: Automated formatting

2018-11-21 Thread Cody Koeninger
trokes but not in the details. > Is this something that can be just run in the PR builder? if the rules > are simple and not too hard to maintain, seems like a win. > On Wed, Nov 21, 2018 at 2:26 PM Cody Koeninger wrote: > > > > Definitely not suggesting a mass reformat, just on a per

Re: Automated formatting

2018-11-21 Thread Cody Koeninger
gt; > Is there a way to just check style on PR changes? that's fine. > On Wed, Nov 21, 2018 at 11:40 AM Cody Koeninger wrote: > > > > Is there any appetite for revisiting automating formatting? > > > > I know over the years various people have expressed opposition to

Automated formatting

2018-11-21 Thread Cody Koeninger
Is there any appetite for revisiting automating formatting? I know over the years various people have expressed opposition to it as unnecessary churn in diffs, but having every new contributor greeted with "nit: 4 space indentation for argument lists" isn't very welcoming.

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-19 Thread Cody Koeninger
Anastasios it looks like you already identified the two lines that need to change, the string interpolation that depends on UUID.randomUUID and metadataPath.hashCode. I'd factor that out into a function that returns the group id. That function would also need to take the "parameters" variable

Re: DataSourceV2 sync tomorrow

2018-11-13 Thread Cody Koeninger
Am I the only one for whom the livestream link didn't work last time? Would like to be able to at least watch the discussion this time around. On Tue, Nov 13, 2018 at 6:01 PM Ryan Blue wrote: > > Hi everyone, > I just wanted to send out a reminder that there’s a DSv2 sync tomorrow at > 17:00

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Cody Koeninger
That sounds reasonable to me On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias wrote: > > Hi all, > > I run in the following situation with Spark Structure Streaming (SS) using > Kafka. > > In a project that I work on, there is already a secured Kafka setup where ops > can issue an SSL

Re: Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-08-31 Thread Cody Koeninger
Just got a question about this on the user list as well. Worth removing that link to pwendell's directory from the docs? On Sun, Jan 21, 2018 at 12:13 PM, Jacek Laskowski wrote: > Hi, > > http://spark.apache.org/developer-tools.html#nightly-builds reads: > >> Spark nightly packages are

Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-08-31 Thread Cody Koeninger
+1 to Sean's comment On Fri, Aug 31, 2018 at 2:48 PM, Reynold Xin wrote: > Yup all good points. One way I've done it in the past is to have an appendix > section for design sketch, as an expansion to the question "- What is new in > your approach and why do you think it will be successful?" > >

Re: Migrating from kafka08 client to kafka010

2018-08-02 Thread Cody Koeninger
Short answer is it isn't necessary. Long answer is that you aren't just changing from 08 to 10, you're changing from the receiver based implementation to the direct stream. Read these: https://github.com/koeninger/kafka-exactly-once

Re: [DISCUSS] SPIP: Standardize SQL logical plans

2018-07-17 Thread Cody Koeninger
According to http://spark.apache.org/improvement-proposals.html the shepherd should be a PMC member, not necessarily the person who proposed the SPIP On Tue, Jul 17, 2018 at 9:13 AM, Wenchen Fan wrote: > I don't know an official answer, but conventionally people who propose the > SPIP would

Re: Time for 2.3.1?

2018-05-11 Thread Cody Koeninger
Sounds good, I'd like to add SPARK-24067 today assuming there's no objections On Thu, May 10, 2018 at 1:22 PM, Henry Robinson wrote: > +1, I'd like to get a release out with SPARK-23852 fixed. The Parquet > community are about to release 1.8.3 - the voting period closes

Process for backports?

2018-04-24 Thread Cody Koeninger
https://issues.apache.org/jira/browse/SPARK-24067 is asking to backport a change to the 2.3 branch. My questions - In general are there any concerns about what qualifies for backporting? This adds a configuration variable but shouldn't change default behavior. - Is a separate jira + pr

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Cody Koeninger
Congrats! On Mon, Apr 2, 2018 at 12:28 AM, Wenchen Fan wrote: > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several areas of Spark for a while,

Re: Welcoming some new committers

2018-03-02 Thread Cody Koeninger
tions to Spark 2.3 and other past work: > > - Anirudh Ramanathan (contributor to Kubernetes support) > - Bryan Cutler (contributor to PySpark and Arrow support) > - Cody Koeninger (contributor to streaming and Kafka support) > - Erik Erlandson (contributor to Kubernetes support) > - M

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Cody Koeninger
Was there any answer to my question around the effect of changes to the sink api regarding access to underlying offsets? On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin wrote: > Most of those should be answered by the attached design sketch in the JIRA > ticket. > > On Wed, Nov

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Cody Koeninger
down. > > > On 16-Oct-2017 7:34 PM, "Cody Koeninger" <c...@koeninger.org> wrote: >> >> Have you tried adjusting the timeout? >> >> On Mon, Oct 16, 2017 at 8:08 AM, Suprith T Jain <t.supr...@gmail.com> >> wrote: >> >

Re: Spark Kafka API tries connecting to dead node for every batch, which increases the processing time

2017-10-16 Thread Cody Koeninger
Have you tried adjusting the timeout? On Mon, Oct 16, 2017 at 8:08 AM, Suprith T Jain wrote: > Hi guys, > > I have a 3 node cluster and i am running a spark streaming job. consider the > below example > > /*spark-submit* --master yarn-cluster --class >

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-11 Thread Cody Koeninger
https://issues-test.apache.org/jira/browse/SPARK-18258 On Mon, Sep 11, 2017 at 7:15 AM, Dmitry Naumenko wrote: > Hi all, > > It started as a discussion in > https://stackoverflow.com/questions/46153105/how-to-get-kafka-offsets-with-spark-structured-streaming-api. > > So

Re: Putting Kafka 0.8 behind an (opt-in) profile

2017-09-06 Thread Cody Koeninger
rimental one. > > Is the Kafka 0.10 integration as stable as it is going to be, and worth > marking as such for 2.3.0? > > > On Tue, Sep 5, 2017 at 4:12 PM Cody Koeninger <c...@koeninger.org> wrote: >> >> +1 to going ahead and giving a deprecation warning now >

Re: Putting Kafka 0.8 behind an (opt-in) profile

2017-09-05 Thread Cody Koeninger
+1 to going ahead and giving a deprecation warning now On Tue, Sep 5, 2017 at 6:39 AM, Sean Owen wrote: > On the road to Scala 2.12, we'll need to make Kafka 0.8 support optional in > the build, because it is not available for Scala 2.12. > >

Re: Spark streaming Kafka 0.11 integration

2017-09-05 Thread Cody Koeninger
Here's the jira for upgrading to a 0.10.x point release, which is effectively the discussion of upgrading to 0.11 now https://issues.apache.org/jira/browse/SPARK-18057 On Tue, Sep 5, 2017 at 1:27 AM, matus.cimerman wrote: > Hi guys, > > is there any plans to support

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-28 Thread Cody Koeninger
Just wanted to point out that because the jira isn't labeled SPIP, it won't have shown up linked from http://spark.apache.org/improvement-proposals.html On Mon, Aug 28, 2017 at 2:20 PM, Wenchen Fan wrote: > Hi all, > > It has been almost 2 weeks since I proposed the data

Re: SPARK-19547

2017-06-08 Thread Cody Koeninger
Can you explain in more detail what you mean by "distribute Kafka topics among different instances of same consumer group"? If you're trying to run multiple streams using the same consumer group, it's already documented that you shouldn't do that. On Thu, Jun 8, 2017 at 12:43 AM, Rastogi, Pankaj

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-01 Thread Cody Koeninger
omething like. >> >> df.writeStream.format("kafka").start("topic") >> >> Seems reasonable if people don't think that is confusing. >> >> On Mon, May 1, 2017 at 8:43 AM, Cody Koeninger <c...@koeninger.org> wrote: >>> >>> I'm c

Re: Question about upgrading Kafka client version

2017-03-10 Thread Cody Koeninger
There are existing tickets on the issues around kafka versions, e.g. https://issues.apache.org/jira/browse/SPARK-18057 that haven't gotten any committer weigh-in on direction. On Thu, Mar 9, 2017 at 12:52 PM, Oscar Batori wrote: > Guys, > > To change the subject from

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
pen ticket with the SPIP label show it should show up On Fri, Mar 10, 2017 at 11:19 AM, Reynold Xin <r...@databricks.com> wrote: > We can just start using spip label and link to it. > > > > On Fri, Mar 10, 2017 at 9:18 AM, Cody Koeninger <c...@koeninger.org> wrote:

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
the admins > can make a new issue type unfortunately. We may just have to mention a > convention involving title and label or something. > > On Fri, Mar 10, 2017 at 4:52 PM Cody Koeninger <c...@koeninger.org> wrote: >> >> I think it ought to be its own page, linked from t

Re: Spark Improvement Proposals

2017-03-10 Thread Cody Koeninger
I think it ought to be its own page, linked from the more / community menu dropdowns. We also need the jira tag, and for the page to clearly link to filters that show proposed / completed SPIPs On Fri, Mar 10, 2017 at 3:39 AM, Sean Owen wrote: > Alrighty, if nobody is

Re: Spark Improvement Proposals

2017-03-09 Thread Cody Koeninger
code/doc > change we can just review and merge as usual. > > On Tue, Mar 7, 2017 at 3:15 PM Cody Koeninger <c...@koeninger.org> wrote: >> >> Another week, another ping. Anyone on the PMC willing to call a vote on >> this?

Re: Spark Improvement Proposals

2017-03-07 Thread Cody Koeninger
it to a vote and revisit the proposal in a few >> months. >> Joseph >> >> On Fri, Feb 24, 2017 at 5:35 AM, Cody Koeninger <c...@koeninger.org> >> wrote: >> >>> It's been a week since any further discussion. >>> >>> Do PMC members

Re: Spark Improvement Proposals

2017-02-24 Thread Cody Koeninger
gt; >> wrote: >>> >>> The doc looks good to me. >>> >>> Ryan, the role of the shepherd is to make sure that someone >>> knowledgeable with Spark processes is involved: this person can advise >>> on technical and procedural considerati

Re: Spark Improvement Proposals

2017-02-16 Thread Cody Koeninger
It's just a process > document. > > Still, a fine step IMHO. > > On Thu, Feb 16, 2017 at 4:22 PM Reynold Xin <r...@databricks.com> wrote: >> >> Updated. Any feedback from other community members? >> >> >> On Wed, Feb 15, 2017 at 2:53 AM, Cody Koenin

Re: Spark Improvement Proposals

2017-02-14 Thread Cody Koeninger
ustomers. The to-do feature list was always above 100. Sometimes, the >> customers are feeling frustrated when we are unable to deliver them on time >> due to the resource limits and others. Even if they paid us billions, we >> still need to do it phase by phase or somet

Re: Spark Improvement Proposals

2017-02-11 Thread Cody Koeninger
ccepted or rejected, so that we do not end >> up with a distracting long tail of half-hearted proposals. >> >> These rules are meant to be flexible, but the current document should be >> clear about who is in charge of a SPIP, and the state it is currently in. >> >> We h

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Cody Koeninger
Congrats, glad to hear it On Jan 24, 2017 12:47 PM, "Shixiong(Ryan) Zhu" wrote: > Congrats Burak & Holden! > > On Tue, Jan 24, 2017 at 10:39 AM, Joseph Bradley > wrote: > >> Congratulations Burak & Holden! >> >> On Tue, Jan 24, 2017 at 10:33 AM,

Re: Feedback on MLlib roadmap process proposal

2017-01-24 Thread Cody Koeninger
Totally agree with most of what Sean said, just wanted to give an alternate take on the "maintainers" thing On Tue, Jan 24, 2017 at 10:23 AM, Sean Owen wrote: > There is no such list because there's no formal notion of ownership or > access to subsets of the project. Tracking

Re: Spark Improvement Proposals

2017-01-03 Thread Cody Koeninger
g some items mentioned above + a new one >> w.r.t. Reynold's draft >> <https://docs.google.com/document/d/1-Zdi_W-wtuxS9hTK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit#> >> : >> * Reinstate the "Where" section with links to current and past SIPs >> * Add field for statin

Re: [VOTE] Apache Spark 2.1.0 (RC2)

2016-12-09 Thread Cody Koeninger
Agree that frequent topic deletion is not a very Kafka-esque thing to do On Fri, Dec 9, 2016 at 12:09 PM, Shixiong(Ryan) Zhu wrote: > Sean, "stress test for failOnDataLoss=false" is because Kafka consumer may > be thrown NPE when a topic is deleted. I added some logic to

Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Cody Koeninger
If you want finer-grained max rate setting, SPARK-17510 got merged a while ago. There's also SPARK-18580 which might help address the issue of starting backpressure rate for the first batch. On Mon, Dec 5, 2016 at 4:18 PM, Liren Ding wrote: > Hey all, > > Does

Re: using Spark Streaming with Kafka 0.9/0.10

2016-11-15 Thread Cody Koeninger
tion says : Input streams that can generate > RDDs from new data by running a service/thread only on the driver node (that > is, without running a receiver on worker nodes) > > Thanks and regards, > Aakash Pradeep > > > On Tue, Nov 15, 2016 at 2:55 PM, Cody Koeninger <

Re: using Spark Streaming with Kafka 0.9/0.10

2016-11-15 Thread Cody Koeninger
It'd probably be worth no longer marking the 0.8 interface as experimental. I don't think it's likely to be subject to active development at this point. You can use the 0.8 artifact to consume from a 0.9 broker Where are you reading documentation indicating that the direct stream only runs on

Re: Connectors using new Kafka consumer API

2016-11-09 Thread Cody Koeninger
t;m...@apache.org> wrote: > I think they are open to others helping, in fact, more than one person has > worked on the JIRA so far. And, it's been crawling really slowly and that's > preventing adoption of Spark's new connector in secure Kafka environments. > > On Tue, Nov 8, 2016 at 7:59

Re: Connectors using new Kafka consumer API

2016-11-08 Thread Cody Koeninger
Have you asked the assignee on the Kafka jira whether they'd be willing to accept help on it? On Tue, Nov 8, 2016 at 5:26 PM, Mark Grover wrote: > Hi all, > We currently have a new direct stream connector, thanks to work by Cody and > others on SPARK-12177. > > However, that

Re: Spark Improvement Proposals

2016-11-08 Thread Cody Koeninger
gt; Oops. Let me try figure that out. >> >> >> On Monday, November 7, 2016, Cody Koeninger <c...@koeninger.org> wrote: >>> >>> Thanks for picking up on this. >>> >>> Maybe I fail at google docs, but I can't see any edits on the documen

Re: Odp.: Spark Improvement Proposals

2016-11-07 Thread Cody Koeninger
nice) would also be nice, >>> but that can be done at any time. >>> >>> BTW, shameless plug: I filed SPARK-18085 which I consider a candidate >>> for a SIP, given the scope of the work. The document attached even >>> somewhat matches the proposed format. So

Anyone want to weigh in on a Kafka DStreams api change?

2016-11-04 Thread Cody Koeninger
SPARK-17510 https://github.com/apache/spark/pull/15132 It's for allowing tweaking of rate limiting on a per-partition basis - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Continuous warning while consuming using new kafka-spark010 API

2016-11-04 Thread Cody Koeninger
I answered the duplicate post on the user mailing list, I'd say keep the discussion there. On Fri, Nov 4, 2016 at 12:14 PM, vonnagy wrote: > Nitin, > > I am getting the similar issues using Spark 2.0.1 and Kafka 0.10. I have to > jobs, one that uses a Kafka stream and one that

Re: Handling questions in the mailing lists

2016-11-02 Thread Cody Koeninger
So concrete things people could do - users could tag subject lines appropriately to the component they're asking about - contributors could monitor user@ for tags relating to components they've worked on. I'd be surprised if my miss rate for any mailing list questions well-labeled as Kafka was

Re: JIRA Components for Streaming

2016-10-31 Thread Cody Koeninger
Makes sense to me. I do wonder if e.g. [SPARK-12345][STRUCTUREDSTREAMING][KAFKA] is going to leave any room in the Github PR form for actual title content? On Mon, Oct 31, 2016 at 1:37 PM, Michael Armbrust wrote: > I'm planning to do a little maintenance on JIRA to

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Cody Koeninger
h them, my mail was just to show some > aspects from my side, so from theside of developer and person who is trying > to help others with Spark (via StackOverflow or other ways) > > > Pozdrawiam / Best regards, > > Tomasz > > > > Od: Cody K

Re: Straw poll: dropping support for things like Scala 2.10

2016-10-25 Thread Cody Koeninger
I think only supporting 1 version of scala at any given time is not sufficient, 2 probably is ok. I.e. don't drop 2.10 before 2.12 is out + supported On Tue, Oct 25, 2016 at 10:56 AM, Sean Owen wrote: > The general forces are that new versions of things to support emerge,

Re: [PSA] TaskContext.partitionId != the actual logical partition index

2016-10-20 Thread Cody Koeninger
think that makes sense, I can start a ticket. On Thu, Oct 20, 2016 at 1:16 PM, Reynold Xin <r...@databricks.com> wrote: > Seems like a good new API to add? > > > On Thu, Oct 20, 2016 at 11:14 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Access to the par

Re: [PSA] TaskContext.partitionId != the actual logical partition index

2016-10-20 Thread Cody Koeninger
Access to the partition ID is necessary for basically every single one of my jobs, and there isn't a foreachPartiionWithIndex equivalent. You can kind of work around it with empty foreach after the map, but it's really awkward to explain to people. On Thu, Oct 20, 2016 at 12:52 PM, Reynold Xin

Re: StructuredStreaming status

2016-10-19 Thread Cody Koeninger
ncy ? >> > I think that the fact that they serve as an output trigger is a problem, > but Structured Streaming seems to resolve this now. > >> >> Thanks >> Shivaram >> >> On Wed, Oct 19, 2016 at 1:29 PM, Michael Armbrust >> <mich...@databricks

Re: StructuredStreaming status

2016-10-19 Thread Cody Koeninger
Is anyone seriously thinking about alternatives to microbatches? On Wed, Oct 19, 2016 at 2:45 PM, Michael Armbrust wrote: > Anything that is actively being designed should be in JIRA, and it seems > like you found most of it. In general, release windows can be found on

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-18 Thread Cody Koeninger
+1 to putting docs in one clear place. On Oct 18, 2016 6:40 AM, "Sean Owen" wrote: > I'm OK with that. The upside to the wiki is that it can be edited directly > outside of a release cycle. However, in practice I find that the wiki is > rarely changed. To me it also serves

Re: cutting 2.0.2?

2016-10-17 Thread Cody Koeninger
SPARK-17841 three line bugfix that has a week old PR SPARK-17812 being able to specify starting offsets is a must have for a Kafka mvp in my opinion, already has a PR SPARK-17813 I can put in a PR for this tonight if it'll be considered On Mon, Oct 17, 2016 at 12:28 AM, Reynold Xin

Re: Spark Improvement Proposals

2016-10-17 Thread Cody Koeninger
P. However I think that Spark should >> have real-time streaming support. Currently I see many posts/comments >> that "Spark has too big latency". Spark Streaming is doing very good >> jobs with micro-batches, however I think it is possible to add also more >

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-10-13 Thread Cody Koeninger
I've always been confused as to why it would ever be a good idea to put any streaming query system on the critical path for synchronous < 100msec requests. It seems to make a lot more sense to have a streaming system do asynch updates of a store that has better latency and quality of service

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
ul, because we have run into some trouble in the past > with some inside the ASF but essentially outside the Spark community who > didn't like the way we were doing things. > > On Mon, Oct 10, 2016 at 3:53 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Apache documents s

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
ng lots of rules >> from the beginning makes it confusing and can reduce contributions. >> Although, as engineers, we believe that anything can be solved using >> mechanical rules, in practice software development is a social process that >> ultimately requires humans to tackle

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
community effort and I wouldn't want to move forward if up to half of the > community thinks it's an untenable idea. > > rb > > On Mon, Oct 10, 2016 at 12:07 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> I think this is closer to a procedural issue than a code mod

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
the value of codifying that in our process? I >> think restricting who can submit proposals would only undermine them by >> pushing contributors out. Maybe I'm missing something here? >> >> rb >> >> >> >> On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c.

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
o can submit proposals would only undermine them by > pushing contributors out. Maybe I'm missing something here? > > rb > > > > On Mon, Oct 10, 2016 at 7:41 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> Yes, users suggesting SIPs is a good thing and is expl

Re: Spark Improvement Proposals

2016-10-10 Thread Cody Koeninger
rategies have a large effect on the goal, we should > have it discussed when discussing the goals. In addition, while it is often > easy to throw out completely infeasible goals, it is often much harder to > figure out that the goals are unfeasible without fine tuning. > > > > >

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
o I'd really like a culture of having those early. > People don't argue about prettiness when they discuss APIs, they argue about > the core concepts to expose in order to meet various goals, and then they're > stuck maintaining those for a long time. > > Matei > > On Oct 9,

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
proposal is technically feasible >> right now? If it's infeasible, that will be discovered later during design >> and implementation. Same thing with rejected strategies -- listing some of >> those is definitely useful sometimes, but if you make this a *required* >> se

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
iscovered later > during design and implementation. Same thing with rejected strategies -- > listing some of those is definitely useful sometimes, but if you make this > a *required* section, people are just going to fill it in with bogus stuff > (I've seen this happen before). > &g

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
Regarding name, if the SIP overlap is a concern, we can pick a different name. My tongue in cheek suggestion would be Spark Lightweight Improvement process (SPARKLI) On Sun, Oct 9, 2016 at 4:14 PM, Cody Koeninger <c...@koeninger.org> wrote: > So to focus the discussion on the specific

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
ures more visible (and their approval more formal)? > > BTW note that in either case, I'd like to have a template for design docs > too, which should also include goals. I think that would've avoided some of > the issues you brought up. > > Matei > > On Oct 9, 2016, at 10

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
At a super high level, it depends on whether you want the SIPs to be >>> PRDs for getting some quick feedback on the goals of a feature before it is >>> designed, or something more like full-fledged design docs (just a more >>> visible design doc for bigger changes). I loo

Re: Spark Improvement Proposals

2016-10-09 Thread Cody Koeninger
n what this > entails, and then we can discuss this the specific proposal as well. > > > On Fri, Oct 7, 2016 at 2:29 PM, Cody Koeninger <c...@koeninger.org> wrote: > >> Yeah, in case it wasn't clear, I was talking about SIPs for major >> user-facing or cross-cutting changes,

Re: PSA: JIRA resolutions and meanings

2016-10-09 Thread Cody Koeninger
That's awesome Sean, very clear. One minor thing, noncommiters can't change assigned field as far as I know. On Oct 9, 2016 3:40 AM, "Sean Owen" wrote: I added a variant on this text to https://cwiki.apache.org/

Re: Improving governance / committers (split from Spark Improvement Proposals thread)

2016-10-08 Thread Cody Koeninger
It's not about technical design disagreement as to matters of taste, it's about familiarity with the domain. To make an analogy, it's as if a committer in MLlib was firmly intent on, I dunno, treating a collection of categorical variables as if it were an ordered range of continuous variables.

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Cody Koeninger
, Reynold Xin <r...@databricks.com> wrote: > I think so (at least I think it is socially acceptable). Of course, use good > judgement here :) > > > > On Sat, Oct 8, 2016 at 12:06 PM, Cody Koeninger <c...@koeninger.org> wrote: >> >> So to be clear, can I go c

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Cody Koeninger
So to be clear, can I go clean up the Kafka cruft? On Sat, Oct 8, 2016 at 1:41 PM, Reynold Xin wrote: > > On Sat, Oct 8, 2016 at 2:09 AM, Sean Owen wrote: >> >> >> - Resolve as Fixed if there's a change you can point to that resolved the >> issue >> - If

Re: Improving volunteer management / JIRAs (split from Spark Improvement Proposals thread)

2016-10-08 Thread Cody Koeninger
D to 5 days and use > this as a regular way to bring contributions to the attention of committers. > > I dunno if people think this is perhaps too complex, but at our scale I > feel we need some kind of loose but automated system for funneling > contributions through some kind of lifecyc

Re: PSA: JIRA resolutions and meanings

2016-10-08 Thread Cody Koeninger
That makes sense, thanks. One thing I've never been clear on is who should be allowed to resolve Jiras. Can I go clean up the backlog of Kafka Jiras that weren't created by me? If there's an informal policy here, can we update the wiki to reflect it? Maybe it's there already, but I didn't see

Re: Improving volunteer management / JIRAs (split from Spark Improvement Proposals thread)

2016-10-07 Thread Cody Koeninger
of loose but automated system for funneling contributions > through some kind of lifecycle. The status quo is just not that good (e.g. > 474 open PRs against Spark as of this moment). > > Nick > > > On Fri, Oct 7, 2016 at 4:48 PM Cody Koeninger <c...@koeninger.org> wrote: &g

Re: Spark Improvement Proposals

2016-10-07 Thread Cody Koeninger
ssing features, slow reviews > which is understandable to some extent... it is not only about Spark but > things can be improved for sure for this project in particular as already > stated. > > On Fri, Oct 7, 2016 at 11:14 PM, Cody Koeninger <c...@koeninger.org> > wrote: > &

Improving volunteer management / JIRAs (split from Spark Improvement Proposals thread)

2016-10-07 Thread Cody Koeninger
Matei asked: > I agree about empowering people interested here to contribute, but I'm > wondering, do you think there are technical things that people don't want to > work on, or is it a matter of what there's been time to do? It's a matter of mismanagement and miscommunication. The

Re: Kafaka 0.8, 0.9 in Structured Streaming

2016-10-07 Thread Cody Koeninger
Without a hell of a lot more work, Assign would be the only strategy usable. On Fri, Oct 7, 2016 at 3:25 PM, Michael Armbrust wrote: >> The implementation is totally and completely different however, in ways >> that leak to the end user. > > > Can you elaborate?

Re: Kafaka 0.8, 0.9 in Structured Streaming

2016-10-07 Thread Cody Koeninger
0.10 consumers won't work on an earlier broker. Earlier consumers will (should?) work on a 0.10 broker. The main things earlier consumers lack from a user perspective is support for SSL, and pre-fetching messages. The implementation is totally and completely different however, in ways that leak

Re: Spark Improvement Proposals

2016-10-07 Thread Cody Koeninger
+1 to adding an SIP label and linking it from the website. I think it needs - template that focuses it towards soliciting user goals / non goals - clear resolution as to which strategy was chosen to pursue. I'd recommend a vote. Matei asked me to clarify what I meant by changing interfaces, I

Re: Spark Improvement Proposals

2016-10-07 Thread Cody Koeninger
Sean, that was very eloquently put, and I 100% agree. If I ever meet you in person, I'll buy you multiple rounds of beverages of your choice ;) This is probably reiterating some of what you said in a less clear manner, but I'll throw more of my 2 cents in. - Design. Yes, design by committee

Spark Improvement Proposals

2016-10-06 Thread Cody Koeninger
I love Spark. 3 or 4 years ago it was the first distributed computing environment that felt usable, and the community was welcoming. But I just got back from the Reactive Summit, and this is what I observed: - Industry leaders on stage making fun of Spark's streaming model - Open source project

Re: Spark SQL JSON Column Support

2016-09-29 Thread Cody Koeninger
Totally agree that specifying the schema manually should be the baseline. LGTM, thanks for working on it. Seems like it looks good to others too judging by the comment on the PR that it's getting merged to master :) On Thu, Sep 29, 2016 at 2:13 PM, Michael Armbrust

Re: Spark SQL JSON Column Support

2016-09-29 Thread Cody Koeninger
Will this be able to handle projection pushdown if a given job doesn't utilize all the columns in the schema? Or should people have a per-job schema? On Wed, Sep 28, 2016 at 2:17 PM, Michael Armbrust wrote: > Burak, you can configure what happens with corrupt records for

Re: [discuss] Spark 2.x release cadence

2016-09-29 Thread Cody Koeninger
Regarding documentation debt, is there a reason not to deploy documentation updates more frequently than releases? I recall this used to be the case. On Wed, Sep 28, 2016 at 3:35 PM, Joseph Bradley wrote: > +1 for 4 months. With QA taking about a month, that's very

Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Cody Koeninger
To be clear, "safe" has very little to do with this. It's pretty clear that there's very little risk of the spark module for kinesis being considered a derivative work, much less all of spark. The use limitation in 3.3 that caused the amazon license to be put on the apache X list also doesn't

Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Cody Koeninger
I don't see a reason to remove the non-assembly artifact, why would you? You're not distributing copies of Amazon licensed code, and the Amazon license goes out of its way not to over-reach regarding derivative works. This seems pretty clearly to fall in the spirit of

Re: Committing Kafka offsets when using DirectKafkaInputDStream

2016-09-03 Thread Cody Koeninger
The Kafka commit api isn't transactional, you aren't going to get exactly once behavior out of it even if you were committing offsets on a per-partition basis. This doesn't really have anything to do with Spark; the old code you posted was already inherently broken. Make your outputs idempotent

Re: Model abstract class in spark ml

2016-08-31 Thread Cody Koeninger
http://blog.originate.com/blog/2014/02/27/types-inside-types-in-scala/ On Wed, Aug 31, 2016 at 2:19 AM, Sean Owen wrote: > Weird, I recompiled Spark with a similar change to Model and it seemed > to work but maybe I missed a step in there. > > On Wed, Aug 31, 2016 at 6:33 AM,

  1   2   3   >