Hi, thanks for the feedback I'll try to explain better what I meant. First we had RDDs, then we had DataFrames, so could the next step be something like stored procedures over DataFrames? So I define the whole calculation flow, even if it includes any "actions" in between, and the whole thing is planned and executed in a super optimized way once I tell it "go!"
What I mean by "feels like scripted" is that actions come back to the driver, like they would if you were in front of a command prompt. But often the flow contains many steps with actions in between - multiple levels of aggregations, iterative machine learning algorithms etc. Sending the whole "workplan" to the Spark framework would be, as I see it, the next step of it's evolution, like stored procedures send a logic with many SQL queries to the database. Was it more clear this time? :) *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Sun, Nov 8, 2015 at 5:59 PM, Koert Kuipers <ko...@tresata.com> wrote: > romi, > unless am i misunderstanding your suggestion you might be interested in > projects like the new mahout where they try to abstract out the engine with > bindings, so that they can support multiple engines within a single > platform. I guess cascading is heading in a similar direction (although no > spark or flink yet there, just mr1 and tez). > > On Sun, Nov 8, 2015 at 6:33 AM, Sean Owen <so...@cloudera.com> wrote: > >> Major releases can change APIs, yes. Although Flink is pretty similar >> in broad design and goals, the APIs are quite different in >> particulars. Speaking for myself, I can't imagine merging them, as it >> would either mean significantly changing Spark APIs, or making Flink >> use Spark APIs. It would mean effectively removing one project which >> seems infeasible. >> >> I am not sure of what you're saying the difference is, but I would not >> describe Spark as primarily for interactive use. >> >> Philosophically, I don't think One Big System to Rule Them All is a >> good goal. One project will never get it all right even within one >> niche. It's actually valuable to have many takes on important >> problems. Hence any problem worth solving gets solved 10 times. Just >> look at all those SQL engines and logging frameworks... >> >> On Sun, Nov 8, 2015 at 10:53 AM, Romi Kuntsman <r...@totango.com> wrote: >> > A major release usually means giving up on some API backward >> compatibility? >> > Can this be used as a chance to merge efforts with Apache Flink >> > (https://flink.apache.org/) and create the one ultimate open source >> big data >> > processing system? >> > Spark currently feels like it was made for interactive use (like Python >> and >> > R), and when used others (batch/streaming), it feels like scripted >> > interactive instead of really a standalone complete app. Maybe some base >> > concepts may be adapted? >> > >> > (I'm not currently a committer, but as a heavy Spark user I'd love to >> > participate in the discussion of what can/should be in Spark 2.0) >> > >> > Romi Kuntsman, Big Data Engineer >> > http://www.totango.com >> > >> > On Fri, Nov 6, 2015 at 2:53 PM, Jean-Baptiste Onofré <j...@nanthrax.net> >> > wrote: >> >> >> >> Hi Sean, >> >> >> >> Happy to see this discussion. >> >> >> >> I'm working on PoC to run Camel on Spark Streaming. The purpose is to >> have >> >> an ingestion and integration platform directly running on Spark >> Streaming. >> >> >> >> Basically, we would be able to use a Camel Spark DSL like: >> >> >> >> >> >> >> from("jms:queue:foo").choice().when(predicate).to("job:bar").when(predicate).to("hdfs:path").otherwise("file:path").... >> >> >> >> Before a formal proposal (I have to do more work there), I'm just >> >> wondering if such framework can be a new Spark module (Spark >> Integration for >> >> instance, like Spark ML, Spark Stream, etc). >> >> >> >> Maybe it could be a good candidate for an addition in a "major" release >> >> like Spark 2.0. >> >> >> >> Just my $0.01 ;) >> >> >> >> Regards >> >> JB >> >> >> >> >> >> On 11/06/2015 01:44 PM, Sean Owen wrote: >> >>> >> >>> Since branch-1.6 is cut, I was going to make version 1.7.0 in JIRA. >> >>> However I've had a few side conversations recently about Spark 2.0, >> and >> >>> I know I and others have a number of ideas about it already. >> >>> >> >>> I'll go ahead and make 1.7.0, but thought I'd ask, how much other >> >>> interest is there in starting to plan Spark 2.0? is that even on the >> >>> table as the next release after 1.6? >> >>> >> >>> Sean >> >> >> >> >> >> -- >> >> Jean-Baptiste Onofré >> >> jbono...@apache.org >> >> http://blog.nanthrax.net >> >> Talend - http://www.talend.com >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> >