Re: Clojure interop with Spark

2020-07-09 Thread Dominic Parry
Another option is Apache Beam. We use it quite extensively. There are a few 
options for Clojure wrappers (we use datasplash), and beam has libraries for a 
number of popular languages.


Kind Regards,
Dom Parry
On 10 Jul 2020, 08:22 +0200, Alex Ott , wrote:
> From Spark perspective, I would really advise to use Dataframe API as much as 
> possible, including the Spark Structured Streaming instead of Spark Streaming 
> - the main reason is more optimized execution of the code because of all 
> optimizations that Catalyst is able to make. But I really don't see libraries 
> that wrap dataframe API
>
> > On Thu, Jul 9, 2020 at 11:36 PM Tim Clemons  wrote:
> > > I'm putting together a big data system centered around using Spark 
> > > Streaming for data ingest and Spark SQL for querying the stored data.  
> > > I've been investigating what options there are for implementing Spark 
> > > applications using Clojure.  It's been close to a decade since sparkling 
> > > or flambo have received any updates and it doesn't look like either will 
> > > accommodate recent distributions of Spark.  I've found powderkeg an 
> > > interesting option, and I like how it supports remote REPLs and the use 
> > > of tranducers rather than wrapped Scala fns.  However, it looks like it's 
> > > also seen a few years without commits and I've heard loose talk that the 
> > > developers have moved on to other pursuits.
> > >
> > > Part of the problem seems to be Spark.  The project seem unapologetic 
> > > about breaking interfaces and seems willing to sacrifice third-party code 
> > > that tries to track Spark's development.
> > >
> > > So my options seem to be the following:
> > >
> > > 1. Deploy an older version of Spark that's compatible with one of the 
> > > above mentioned libraries.  While we don't need to be bleeding edge, 
> > > deploying a three year old version just to accommodate my preferred 
> > > language is hard to justify.
> > >
> > > 2. Create a merge to update one of those libraries to more recent 
> > > versions of Spark and be prepared to maintain it internally for the 
> > > lifespan of this project.  This may be vastly overestimating my personal 
> > > heroics.
> > >
> > > 3. Code my own solution from scratch using Java/Scala interop, sketching 
> > > out just enough of a Clojure wrapper to suit my ends.
> > >
> > > 4. Learn Scala.
> > >
> > > I realize that Spark isn't the only game in town (Onyx, for example).  
> > > However, I'm working with a team of developers who are not familiar with 
> > > Clojure (though I'm working to be an advocate). I choose Spark as an 
> > > established solution that supports multiple languages and handles both 
> > > streaming and batch processing.
> > >
> > > Any insights?  Any solutions I'm overlooking?
> > >
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Clojure" group.
> > > To post to this group, send email to clojure@googlegroups.com
> > > Note that posts from new members are moderated - please be patient with 
> > > your first post.
> > > To unsubscribe from this group, send email to
> > > clojure+unsubscr...@googlegroups.com
> > > For more options, visit this group at
> > > http://groups.google.com/group/clojure?hl=en
> > > ---
> > > You received this message because you are subscribed to the Google Groups 
> > > "Clojure" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to clojure+unsubscr...@googlegroups.com.
> > > To view this discussion on the web visit 
> > > https://groups.google.com/d/msgid/clojure/259f5ff6-dd66-4688-aa80-439fed88ab39o%40googlegroups.com.
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clojure/CALV1_%3DJtBC02CchwoCT3%3DgHbdMBfaACRA_T6yRnZo0KCr9tACg%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
h

Re: [ANN] thurber: Clojure on Apache Beam (distributed batch/streaming)

2020-01-22 Thread Dominic Parry
Hi Aaron,

I’m just a minor contributor to datasplash at present, because we’re using Beam 
as the centre of our data streaming / processing functions. I like the idea 
you’re presenting, but haven’t really stretched our solutions to the point 
where AOT was a problem for us.

I look forward to further developments on Thurber though!


On 22 Jan 2020, 18:22 +0200, Aaron D. , wrote:
>
> Hi Dominic thank you!
>
> Are you maintainer/contrib to datasplash, I would be happy to swap notes, 
> synthesize ideas.
>
> My org looked at datasplash. The biggest dealbreaker for us was datasplash's 
> AOT-orientation; its AOT packaging meant we couldn't float its dependencies, 
> and didn't like the requirement to AOT compile our own code.
>
> With thurber I started w/ this goal to avoid AOT, be highly dynamic in the 
> repl, but was also able to focus on certain performance areas from the 
> bottom-up. thurber also eschews sugared/dsl-ish api for more direct/explicit 
> interop w the Beam SDK leaving this implementation concern to layers above 
> (though I may implement if interest)-- just a different 'opinionated' take 
> here.
>
> My org had been using Onyx for streaming use cases but its original 
> developers have moved on and we were concerned with its long-term viability. 
> Many of the ideals of thurber are consistent with Onyx's and reaching 
> previous Onyx users like ourselves was another line of sight for thurber.
>
> We'd also looked at clj-headlights - this was the other clojure Beam lib we'd 
> surveyed in this space.
>
> On Wednesday, January 22, 2020 at 7:30:17 AM UTC-6, Dominic Parry wrote:
> > Hi!
> >
> > Congratulations on the library! It makes me super happy when people build 
> > clojure libraries for the Google cloud ecosystem. I wanted to draw your 
> > attention to datasplash (https://github.com/ngrunwald/datasplash) which has 
> > made a start on this. I thought perhaps you could leverage some of it.
> >
> > Hope you have a great day!
> > On 21 Jan 2020, 23:10 +0200, atdixon , wrote:
> > > Here is thurber (https://github.com/atdixon/thurber) (at early alpha 
> > > release) that enables Clojure on Apache Beam platforms like Google 
> > > Dataflow.
> > >
> > > thurber's goals include:
> > >
> > > - Full support for Beam capabilities
> > > - AOT-less (AOT not required; full dynamic support for serializing 
> > > functions, including inlined functions, and proxies)
> > > - Macro-less (very few, always optional, macros)
> > > - Performance focus (core optimized for large volume data streaming)
> > > - Idiomatic Clojure focus (Clojure functions are automatically 
> > > distributable functional transforms, lazy sequences over iterative 
> > > output, ..)
> > >
> > > When coming to Apache Beam and wanting to use Clojure there are a few 
> > > hurdles to overcome, some discussed here in the past.  Clojure's Java 
> > > interop commonly falls short in the domain of distributed big data Java 
> > > platforms (proxies and functions not serializable, no support for 
> > > generation of generic type signatures, minimal/insufficient support for 
> > > method annotations, suboptimal dynamic binding performance, etc)
> > >
> > > thurber bridges these issues internally, giving a full dynamic/Clojure 
> > > experience on top of Apache Beam.
> > >
> > > (For Onyx users, thurber + Beam meet the same ideals as Onyx on a 
> > > well-backed platform.)
> > >
> > > This is early alpha release and feedback on the API & facilities are 
> > > welcome.
> > >
> > > For the curious, the walkthrough covers most of thurber capability: 
> > > https://github.com/atdixon/thurber/blob/master/demo/walkthrough.clj
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Clojure" group.
> > > To post to this group, send email to clo...@googlegroups.com
> > > Note that posts from new members are moderated - please be patient with 
> > > your first post.
> > > To unsubscribe from this group, send email to
> > > clo...@googlegroups.com
> > > For more options, visit this group at
> > > http://groups.google.com/group/clojure?hl=en
> > > ---
> > > You received this message because you are subscribed to the Google Groups 
> > > "Clojure" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to clo...@googlegroups.com.
> > >

Re: [ANN] thurber: Clojure on Apache Beam (distributed batch/streaming)

2020-01-22 Thread Dominic Parry
Hi!

Congratulations on the library! It makes me super happy when people build 
clojure libraries for the Google cloud ecosystem. I wanted to draw your 
attention to datasplash (https://github.com/ngrunwald/datasplash) which has 
made a start on this. I thought perhaps you could leverage some of it.

Hope you have a great day!
On 21 Jan 2020, 23:10 +0200, atdixon , wrote:
> Here is thurber (https://github.com/atdixon/thurber) (at early alpha release) 
> that enables Clojure on Apache Beam platforms like Google Dataflow.
>
> thurber's goals include:
>
> - Full support for Beam capabilities
> - AOT-less (AOT not required; full dynamic support for serializing functions, 
> including inlined functions, and proxies)
> - Macro-less (very few, always optional, macros)
> - Performance focus (core optimized for large volume data streaming)
> - Idiomatic Clojure focus (Clojure functions are automatically distributable 
> functional transforms, lazy sequences over iterative output, ..)
>
> When coming to Apache Beam and wanting to use Clojure there are a few hurdles 
> to overcome, some discussed here in the past.  Clojure's Java interop 
> commonly falls short in the domain of distributed big data Java platforms 
> (proxies and functions not serializable, no support for generation of generic 
> type signatures, minimal/insufficient support for method annotations, 
> suboptimal dynamic binding performance, etc)
>
> thurber bridges these issues internally, giving a full dynamic/Clojure 
> experience on top of Apache Beam.
>
> (For Onyx users, thurber + Beam meet the same ideals as Onyx on a well-backed 
> platform.)
>
> This is early alpha release and feedback on the API & facilities are welcome.
>
> For the curious, the walkthrough covers most of thurber capability: 
> https://github.com/atdixon/thurber/blob/master/demo/walkthrough.clj
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clojure/c18cc8e1-01c9-4688-bff3-6d50f128d0e4%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/clojure/e6c2f85a-0994-465c-9372-15f8eca79333%40Spark.