Re: Clojure interop with Spark
Another option is Apache Beam. We use it quite extensively. There are a few options for Clojure wrappers (we use datasplash), and beam has libraries for a number of popular languages. Kind Regards, Dom Parry On 10 Jul 2020, 08:22 +0200, Alex Ott , wrote: > From Spark perspective, I would really advise to use Dataframe API as much as > possible, including the Spark Structured Streaming instead of Spark Streaming > - the main reason is more optimized execution of the code because of all > optimizations that Catalyst is able to make. But I really don't see libraries > that wrap dataframe API > > > On Thu, Jul 9, 2020 at 11:36 PM Tim Clemons wrote: > > > I'm putting together a big data system centered around using Spark > > > Streaming for data ingest and Spark SQL for querying the stored data. > > > I've been investigating what options there are for implementing Spark > > > applications using Clojure. It's been close to a decade since sparkling > > > or flambo have received any updates and it doesn't look like either will > > > accommodate recent distributions of Spark. I've found powderkeg an > > > interesting option, and I like how it supports remote REPLs and the use > > > of tranducers rather than wrapped Scala fns. However, it looks like it's > > > also seen a few years without commits and I've heard loose talk that the > > > developers have moved on to other pursuits. > > > > > > Part of the problem seems to be Spark. The project seem unapologetic > > > about breaking interfaces and seems willing to sacrifice third-party code > > > that tries to track Spark's development. > > > > > > So my options seem to be the following: > > > > > > 1. Deploy an older version of Spark that's compatible with one of the > > > above mentioned libraries. While we don't need to be bleeding edge, > > > deploying a three year old version just to accommodate my preferred > > > language is hard to justify. > > > > > > 2. Create a merge to update one of those libraries to more recent > > > versions of Spark and be prepared to maintain it internally for the > > > lifespan of this project. This may be vastly overestimating my personal > > > heroics. > > > > > > 3. Code my own solution from scratch using Java/Scala interop, sketching > > > out just enough of a Clojure wrapper to suit my ends. > > > > > > 4. Learn Scala. > > > > > > I realize that Spark isn't the only game in town (Onyx, for example). > > > However, I'm working with a team of developers who are not familiar with > > > Clojure (though I'm working to be an advocate). I choose Spark as an > > > established solution that supports multiple languages and handles both > > > streaming and batch processing. > > > > > > Any insights? Any solutions I'm overlooking? > > > > > > > > > > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "Clojure" group. > > > To post to this group, send email to clojure@googlegroups.com > > > Note that posts from new members are moderated - please be patient with > > > your first post. > > > To unsubscribe from this group, send email to > > > clojure+unsubscr...@googlegroups.com > > > For more options, visit this group at > > > http://groups.google.com/group/clojure?hl=en > > > --- > > > You received this message because you are subscribed to the Google Groups > > > "Clojure" group. > > > To unsubscribe from this group and stop receiving emails from it, send an > > > email to clojure+unsubscr...@googlegroups.com. > > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/clojure/259f5ff6-dd66-4688-aa80-439fed88ab39o%40googlegroups.com. > > > -- > With best wishes, Alex Ott > http://alexott.net/ > Twitter: alexott_en (English), alexott (Russian) > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/clojure/CALV1_%3DJtBC02CchwoCT3%3DgHbdMBfaACRA_T6yRnZo0KCr9tACg%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at h
Re: [ANN] thurber: Clojure on Apache Beam (distributed batch/streaming)
Hi Aaron, I’m just a minor contributor to datasplash at present, because we’re using Beam as the centre of our data streaming / processing functions. I like the idea you’re presenting, but haven’t really stretched our solutions to the point where AOT was a problem for us. I look forward to further developments on Thurber though! On 22 Jan 2020, 18:22 +0200, Aaron D. , wrote: > > Hi Dominic thank you! > > Are you maintainer/contrib to datasplash, I would be happy to swap notes, > synthesize ideas. > > My org looked at datasplash. The biggest dealbreaker for us was datasplash's > AOT-orientation; its AOT packaging meant we couldn't float its dependencies, > and didn't like the requirement to AOT compile our own code. > > With thurber I started w/ this goal to avoid AOT, be highly dynamic in the > repl, but was also able to focus on certain performance areas from the > bottom-up. thurber also eschews sugared/dsl-ish api for more direct/explicit > interop w the Beam SDK leaving this implementation concern to layers above > (though I may implement if interest)-- just a different 'opinionated' take > here. > > My org had been using Onyx for streaming use cases but its original > developers have moved on and we were concerned with its long-term viability. > Many of the ideals of thurber are consistent with Onyx's and reaching > previous Onyx users like ourselves was another line of sight for thurber. > > We'd also looked at clj-headlights - this was the other clojure Beam lib we'd > surveyed in this space. > > On Wednesday, January 22, 2020 at 7:30:17 AM UTC-6, Dominic Parry wrote: > > Hi! > > > > Congratulations on the library! It makes me super happy when people build > > clojure libraries for the Google cloud ecosystem. I wanted to draw your > > attention to datasplash (https://github.com/ngrunwald/datasplash) which has > > made a start on this. I thought perhaps you could leverage some of it. > > > > Hope you have a great day! > > On 21 Jan 2020, 23:10 +0200, atdixon , wrote: > > > Here is thurber (https://github.com/atdixon/thurber) (at early alpha > > > release) that enables Clojure on Apache Beam platforms like Google > > > Dataflow. > > > > > > thurber's goals include: > > > > > > - Full support for Beam capabilities > > > - AOT-less (AOT not required; full dynamic support for serializing > > > functions, including inlined functions, and proxies) > > > - Macro-less (very few, always optional, macros) > > > - Performance focus (core optimized for large volume data streaming) > > > - Idiomatic Clojure focus (Clojure functions are automatically > > > distributable functional transforms, lazy sequences over iterative > > > output, ..) > > > > > > When coming to Apache Beam and wanting to use Clojure there are a few > > > hurdles to overcome, some discussed here in the past. Clojure's Java > > > interop commonly falls short in the domain of distributed big data Java > > > platforms (proxies and functions not serializable, no support for > > > generation of generic type signatures, minimal/insufficient support for > > > method annotations, suboptimal dynamic binding performance, etc) > > > > > > thurber bridges these issues internally, giving a full dynamic/Clojure > > > experience on top of Apache Beam. > > > > > > (For Onyx users, thurber + Beam meet the same ideals as Onyx on a > > > well-backed platform.) > > > > > > This is early alpha release and feedback on the API & facilities are > > > welcome. > > > > > > For the curious, the walkthrough covers most of thurber capability: > > > https://github.com/atdixon/thurber/blob/master/demo/walkthrough.clj > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "Clojure" group. > > > To post to this group, send email to clo...@googlegroups.com > > > Note that posts from new members are moderated - please be patient with > > > your first post. > > > To unsubscribe from this group, send email to > > > clo...@googlegroups.com > > > For more options, visit this group at > > > http://groups.google.com/group/clojure?hl=en > > > --- > > > You received this message because you are subscribed to the Google Groups > > > "Clojure" group. > > > To unsubscribe from this group and stop receiving emails from it, send an > > > email to clo...@googlegroups.com. > > >
Re: [ANN] thurber: Clojure on Apache Beam (distributed batch/streaming)
Hi! Congratulations on the library! It makes me super happy when people build clojure libraries for the Google cloud ecosystem. I wanted to draw your attention to datasplash (https://github.com/ngrunwald/datasplash) which has made a start on this. I thought perhaps you could leverage some of it. Hope you have a great day! On 21 Jan 2020, 23:10 +0200, atdixon , wrote: > Here is thurber (https://github.com/atdixon/thurber) (at early alpha release) > that enables Clojure on Apache Beam platforms like Google Dataflow. > > thurber's goals include: > > - Full support for Beam capabilities > - AOT-less (AOT not required; full dynamic support for serializing functions, > including inlined functions, and proxies) > - Macro-less (very few, always optional, macros) > - Performance focus (core optimized for large volume data streaming) > - Idiomatic Clojure focus (Clojure functions are automatically distributable > functional transforms, lazy sequences over iterative output, ..) > > When coming to Apache Beam and wanting to use Clojure there are a few hurdles > to overcome, some discussed here in the past. Clojure's Java interop > commonly falls short in the domain of distributed big data Java platforms > (proxies and functions not serializable, no support for generation of generic > type signatures, minimal/insufficient support for method annotations, > suboptimal dynamic binding performance, etc) > > thurber bridges these issues internally, giving a full dynamic/Clojure > experience on top of Apache Beam. > > (For Onyx users, thurber + Beam meet the same ideals as Onyx on a well-backed > platform.) > > This is early alpha release and feedback on the API & facilities are welcome. > > For the curious, the walkthrough covers most of thurber capability: > https://github.com/atdixon/thurber/blob/master/demo/walkthrough.clj > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/clojure/c18cc8e1-01c9-4688-bff3-6d50f128d0e4%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/e6c2f85a-0994-465c-9372-15f8eca79333%40Spark.