Re: Ready to talk about Spark 2.0?

Romi Kuntsman Sun, 08 Nov 2015 08:16:27 -0800

Hi, thanks for the feedback
I'll try to explain better what I meant.

First we had RDDs, then we had DataFrames, so could the next step be
something like stored procedures over DataFrames?
So I define the whole calculation flow, even if it includes any "actions"
in between, and the whole thing is planned and executed in a super
optimized way once I tell it "go!"


What I mean by "feels like scripted" is that actions come back to the
driver, like they would if you were in front of a command prompt.
But often the flow contains many steps with actions in between - multiple
levels of aggregations, iterative machine learning algorithms etc.
Sending the whole "workplan" to the Spark framework would be, as I see it,
the next step of it's evolution, like stored procedures send a logic with
many SQL queries to the database.

Was it more clear this time? :)


*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

On Sun, Nov 8, 2015 at 5:59 PM, Koert Kuipers <ko...@tresata.com> wrote:

> romi,
> unless am i misunderstanding your suggestion you might be interested in
> projects like the new mahout where they try to abstract out the engine with
> bindings, so that they can support multiple engines within a single
> platform. I guess cascading is heading in a similar direction (although no
> spark or flink yet there, just mr1 and tez).
>
> On Sun, Nov 8, 2015 at 6:33 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> Major releases can change APIs, yes. Although Flink is pretty similar
>> in broad design and goals, the APIs are quite different in
>> particulars. Speaking for myself, I can't imagine merging them, as it
>> would either mean significantly changing Spark APIs, or making Flink
>> use Spark APIs. It would mean effectively removing one project which
>> seems infeasible.
>>
>> I am not sure of what you're saying the difference is, but I would not
>> describe Spark as primarily for interactive use.
>>
>> Philosophically, I don't think One Big System to Rule Them All is a
>> good goal. One project will never get it all right even within one
>> niche. It's actually valuable to have many takes on important
>> problems. Hence any problem worth solving gets solved 10 times. Just
>> look at all those SQL engines and logging frameworks...
>>
>> On Sun, Nov 8, 2015 at 10:53 AM, Romi Kuntsman <r...@totango.com> wrote:
>> > A major release usually means giving up on some API backward
>> compatibility?
>> > Can this be used as a chance to merge efforts with Apache Flink
>> > (https://flink.apache.org/) and create the one ultimate open source
>> big data
>> > processing system?
>> > Spark currently feels like it was made for interactive use (like Python
>> and
>> > R), and when used others (batch/streaming), it feels like scripted
>> > interactive instead of really a standalone complete app. Maybe some base
>> > concepts may be adapted?
>> >
>> > (I'm not currently a committer, but as a heavy Spark user I'd love to
>> > participate in the discussion of what can/should be in Spark 2.0)
>> >
>> > Romi Kuntsman, Big Data Engineer
>> > http://www.totango.com
>> >
>> > On Fri, Nov 6, 2015 at 2:53 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
>> > wrote:
>> >>
>> >> Hi Sean,
>> >>
>> >> Happy to see this discussion.
>> >>
>> >> I'm working on PoC to run Camel on Spark Streaming. The purpose is to
>> have
>> >> an ingestion and integration platform directly running on Spark
>> Streaming.
>> >>
>> >> Basically, we would be able to use a Camel Spark DSL like:
>> >>
>> >>
>> >>
>> from("jms:queue:foo").choice().when(predicate).to("job:bar").when(predicate).to("hdfs:path").otherwise("file:path")....
>> >>
>> >> Before a formal proposal (I have to do more work there), I'm just
>> >> wondering if such framework can be a new Spark module (Spark
>> Integration for
>> >> instance, like Spark ML, Spark Stream, etc).
>> >>
>> >> Maybe it could be a good candidate for an addition in a "major" release
>> >> like Spark 2.0.
>> >>
>> >> Just my $0.01 ;)
>> >>
>> >> Regards
>> >> JB
>> >>
>> >>
>> >> On 11/06/2015 01:44 PM, Sean Owen wrote:
>> >>>
>> >>> Since branch-1.6 is cut, I was going to make version 1.7.0 in JIRA.
>> >>> However I've had a few side conversations recently about Spark 2.0,
>> and
>> >>> I know I and others have a number of ideas about it already.
>> >>>
>> >>> I'll go ahead and make 1.7.0, but thought I'd ask, how much other
>> >>> interest is there in starting to plan Spark 2.0? is that even on the
>> >>> table as the next release after 1.6?
>> >>>
>> >>> Sean
>> >>
>> >>
>> >> --
>> >> Jean-Baptiste Onofré
>> >> jbono...@apache.org
>> >> http://blog.nanthrax.net
>> >> Talend - http://www.talend.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>

Re: Ready to talk about Spark 2.0?

Reply via email to