Re: Structured Streaming with Kafka sources/sinks

Reynold Xin Tue, 30 Aug 2016 09:10:29 -0700

In this case simply not much progress has been made, because people might
be busy with other stuff.


Ofir it looks like you have spent non-trivial amount of time thinking about
this topic and have even designed something to work -- can you chime in on
the JIRA ticket with your thoughts and your prototype? That would be
tremendously useful to the project.



On Tue, Aug 30, 2016 at 11:44 PM, Nicholas Chammas <
[email protected]> wrote:

> > I personally find it disappointing that a big chuck of Spark's design
> and development is happening behind closed curtains.
>
> I'm not too familiar with Streaming, but I see design docs and proposals
> for ML and SQL published here and on JIRA all the time, and they are
> discussed extensively.
>
> For example, here are some ML JIRAs with extensive design discussions:
> SPARK-6725 <https://issues.apache.org/jira/browse/SPARK-6725>, SPARK-13944
> <https://issues.apache.org/jira/browse/SPARK-13944>, SPARK-16365
> <https://issues.apache.org/jira/browse/SPARK-16365>
>
> Nick
>
> On Tue, Aug 30, 2016 at 11:10 AM Cody Koeninger <[email protected]>
> wrote:
>
>> Not that I wouldn't rather have more open communication around this
>> issue...but what are people actually expecting to get out of
>> structured streaming with regard to Kafka?
>>
>> There aren't any realistic pushdown-type optimizations available, and
>> from what I could tell the last time I looked at structured streaming,
>> resolving the event time vs processing time issue was still a ways
>> off.
>>
>> On Tue, Aug 30, 2016 at 1:56 AM, Ofir Manor <[email protected]>
>> wrote:
>> > I personally find it disappointing that a big chuck of Spark's design
>> and
>> > development is happening behind closed curtains. It makes it harder than
>> > necessary for me to work with Spark. We had to improvise in the recent
>> weeks
>> > a temporary solution for reading from Kafka (from Structured Streaming)
>> to
>> > unblock our development, and I feed that if the design and development
>> of
>> > that feature was done in the open, it would have saved us a lot of
>> hassle
>> > (and would reduce the refactoring of our code base).
>> >
>> > It hard not compare it to other Apache projects - for example, I believe
>> > most of the Apache Kafka full-time contributors work at a single
>> company,
>> > but they manage as a community to have a very transparent design and
>> > development process, which seems to work great.
>> >
>> > Ofir Manor
>> >
>> > Co-Founder & CTO | Equalum
>> >
>> > Mobile: +972-54-7801286 | Email: [email protected]
>> >
>> >
>> > On Mon, Aug 29, 2016 at 10:39 PM, Fred Reiss <[email protected]>
>> wrote:
>> >>
>> >> I think that the community really needs some feedback on the progress
>> of
>> >> this very important task. Many existing Spark Streaming applications
>> can't
>> >> be ported to Structured Streaming without Kafka support.
>> >>
>> >> Is there a design document somewhere?  Or can someone from the
>> DataBricks
>> >> team break down the existing monolithic JIRA issue into smaller steps
>> that
>> >> reflect the current development plan?
>> >>
>> >> Fred
>> >>
>> >>
>> >> On Sat, Aug 27, 2016 at 2:32 PM, Koert Kuipers <[email protected]>
>> wrote:
>> >>>
>> >>> thats great
>> >>>
>> >>> is this effort happening anywhere that is publicly visible? github?
>> >>>
>> >>> On Tue, Aug 16, 2016 at 2:04 AM, Reynold Xin <[email protected]>
>> wrote:
>> >>>>
>> >>>> We (the team at Databricks) are working on one currently.
>> >>>>
>> >>>>
>> >>>> On Mon, Aug 15, 2016 at 7:26 PM, Cody Koeninger <[email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>> https://issues.apache.org/jira/browse/SPARK-15406
>> >>>>>
>> >>>>> I'm not working on it (yet?), never got an answer to the question of
>> >>>>> who was planning to work on it.
>> >>>>>
>> >>>>> On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao <
>> [email protected]>
>> >>>>> wrote:
>> >>>>> > Hi all,
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > I’m trying to write Structured Streaming test code and will deal
>> with
>> >>>>> > Kafka
>> >>>>> > source. Currently Spark 2.0 doesn’t support Kafka sources/sinks.
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > I found some Databricks slides saying that Kafka sources/sinks
>> will
>> >>>>> > be
>> >>>>> > implemented in Spark 2.0, so is there anybody working on this? And
>> >>>>> > when will
>> >>>>> > it be released?
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > Thanks,
>> >>>>> >
>> >>>>> > Chenzhao Guo
>> >>>>>
>> >>>>> ------------------------------------------------------------
>> ---------
>> >>>>> To unsubscribe e-mail: [email protected]
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>

Re: Structured Streaming with Kafka sources/sinks

Reply via email to