Re: Commons sub project for parallel method execution

Bruno P. Kinoshita Mon, 12 Jun 2017 19:27:12 -0700

Interesting idea. And great discussion. Can't really say I'd have a use case 
for that right now, so abstaining from the discussion around the implementation.


I believe if we decide to explore this idea in Commons, we will probably move 
it to sandbox? Even if we do not move that to Commons or to sandbox, I intend 
to find some time in the next days to try Apache Commons Javaflow with this 
library.

Jenkins implemented pipelines + continuations with code that when started it 
looked a lot like Javaflow. The execution in parallel is taken care in some 
internal modules in Jenkins, but I would like to see how if simpler 
implementation like this one would work.

Ideally, this utility would execute in parallel, say, 20 tasks each taking 5 
minutes (haven't looked if it supports fork/join). Then I would be able to have 
checkpoints during the execution and if the whole workflow fails, I would be 
able to restart it from the last checkpoint.


I use Java7+ concurrent classes when I need to execute tasks in parallel 
(though I'm adding a flag to Paul King's message in this thread to give GPars a 
try too!), but I am unaware of any way to have persistentable (?) continuation 
workflows as in Jenkins, but with simple Java code.

Cheers
Bruno

________________________________
From: Gary Gregory <garydgreg...@gmail.com>
To: Commons Developers List <dev@commons.apache.org> 
Sent: Tuesday, 13 June 2017 2:08 PM
Subject: Re: Commons sub project for parallel method execution



On Mon, Jun 12, 2017 at 6:56 PM, Matt Sicker <boa...@gmail.com> wrote:

> So wouldn't something like ASM or Javassist or one of the zillion other
> bytecode libraries be a better alternative to using reflection for
> performance? Also, using the Java 7 reflections API improvements helps
> speed things up quite a bit.
>

IMO, unless you are doing scripting, reflection should be a used as a
workaround, but that's just me. For example, like we do in Commons IO's
Java7Support class.

But I digress ;-)

This is clearly an interesting topic. My concern is that there is a LOT of
code out there that does stuff like this at the low and high level from the
JRE's fork/join to Apache Spark and so on as I've stated.

IMO something new would have to be both unique and since this is Commons,
potentially pluggable into other frameworks.

Gary



> On 12 June 2017 at 20:37, Paul King <paul.king.as...@gmail.com> wrote:
>
> > My goto library for such tasks would be GPars. It has both Java and
> > Groovy support for most things (actors/dataflow) but less so for
> > asynchronous task execution. It's one of the things that would be good
> > to explore in light of Java 8. Groovy is now Apache, GPars not at this
> > stage.
> >
> > So with adding two jars (GPars + Groovy), you can use Groovy like this:
> >
> > @Grab('org.codehaus.gpars:gpars:1.2.1')
> > import com.arun.student.StudentService
> > import groovyx.gpars.GParsExecutorsPool
> >
> > long startTime = System.nanoTime()
> > def service = new StudentService()
> > def bookSeries = ["A Song of Ice and Fire": 7, "Wheel of Time": 14,
> > "Harry Potter": 7]
> >
> > def tasks = [
> >         { println service.findStudent("j...@gmail.com", 11, false) },
> >         { println service.getStudentMarks(1L) },
> >         { println service.getStudentsByFirstNames(["John","Alice"]) },
> >         { println service.getRandomLastName() },
> >         { println service.findStudentIdByName("Kate", "Williams") },
> >         { service.printMapValues(bookSeries) }
> > ]
> >
> > GParsExecutorsPool.withPool {
> >     tasks.collect{ it.callAsync() }.collect{ it.get() }
> > //    tasks.eachParallel{ it() } // one of numerous alternatives
> > }
> >
> > long executionTime = (System.nanoTime() - startTime) / 1000000
> > println "\nTotal elapsed time is $executionTime\n\n"
> >
> >
> > Cheers, Paul.
> >
> >
> > On Tue, Jun 13, 2017 at 9:29 AM, Matt Sicker <boa...@gmail.com> wrote:
> > > I'd be interested to see where this leads to. It could end up as a sort
> > of
> > > Commons Parallel library. Besides providing an execution API, there
> could
> > > be plenty of support utilities that tend to be found in all the
> > > *Util(s)/*Helper classes in projects like all the ones I mentioned
> > earlier
> > > (basically all sorts of Hadoop-related projects and other distributed
> > > systems here).
> > >
> > > Really, there's so many ways that such a project could head, I'd like
> to
> > > hear more ideas on what to focus on.
> > >
> > > On 12 June 2017 at 18:19, Gary Gregory <garydgreg...@gmail.com> wrote:
> > >
> > >> The upshot is that there has to be a way to do this with some custom
> > code
> > >> to at least have the ability to 'fast path' the code without
> reflection.
> > >> Using lambdas should make this fairly syntactically unobtrusive.
> > >>
> > >> On Mon, Jun 12, 2017 at 4:02 PM, Arun Mohan <strider90a...@gmail.com>
> > >> wrote:
> > >>
> > >> > Yes, reflection is not very performant but I don't think I have any
> > other
> > >> > choice since the library has to inspect the object supplied by the
> > client
> > >> > at runtime to pick out the methods to be invoked using
> > CompletableFuture.
> > >> > But the performance penalty paid for using reflection will be more
> > than
> > >> > offset by the savings of parallel method execution, more so as the
> no
> > of
> > >> > methods executed in parallel increases.
> > >> >
> > >> > On Mon, Jun 12, 2017 at 3:21 PM, Gary Gregory <
> garydgreg...@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> > > On a lower-level, if you want to use this for lower-level services
> > >> (where
> > >> > > there is no network latency for example), you will need to avoid
> > using
> > >> > > reflection to get the best performance.
> > >> > >
> > >> > > Gary
> > >> > >
> > >> > > On Mon, Jun 12, 2017 at 3:15 PM, Arun Mohan <
> > strider90a...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Gary,
> > >> > > >
> > >> > > > Thanks for your response. You have some valid and interesting
> > points
> > >> > :-)
> > >> > > > Of course you are right that Spark is much more mature. Thanks
> for
> > >> your
> > >> > > > insight.
> > >> > > > It will be interesting indeed to find out if the core
> > parallelization
> > >> > > > engine of Spark can be isolated like you suggest.
> > >> > > >
> > >> > > > I started working on this project because I felt that there was
> no
> > >> good
> > >> > > > library for parallelizing method calls which can be plugged in
> > easily
> > >> > > into
> > >> > > > an existing java project. Ultimately, if such a solution can be
> > >> > > > incorporated in the Apache Commons, it would be a useful
> addition
> > to
> > >> > the
> > >> > > > Commons repository.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Arun
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Jun 12, 2017 at 3:01 PM, Gary Gregory <
> > >> garydgreg...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Arun,
> > >> > > > >
> > >> > > > > Sure, and that is to be expected, Spark is more mature than a
> > four
> > >> > > class
> > >> > > > > prototype. What I am trying to get to is that in order for the
> > >> > library
> > >> > > to
> > >> > > > > be useful, you will end up with more in a first release, and
> > after
> > >> a
> > >> > > > couple
> > >> > > > > more releases, there will be more and more. Would Spark not
> > have in
> > >> > its
> > >> > > > > guts the same kind of code your are proposing here? By
> > extension,
> > >> > will
> > >> > > > you
> > >> > > > > not end up with more framework-like (Spark-like) code and
> > solutions
> > >> > as
> > >> > > > > found in Spark? I am just playing devil's advocate here ;-)
> > >> > > > >
> > >> > > > >
> > >> > > > > What would be interesting would be to find out if there is a
> > core
> > >> > part
> > >> > > of
> > >> > > > > Spark that is separable and ex tractable into a Commons
> > component.
> > >> > > Since
> > >> > > > > Spark has a proven track record, it is more likely, that such
> a
> > >> > library
> > >> > > > > would be generally useful than one created from scratch that
> > does
> > >> not
> > >> > > > > integrate with anything else. Again, please do not take any of
> > this
> > >> > > > > personally, I am just playing here :-)
> > >> > > > >
> > >> > > > > Gary
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Jun 12, 2017 at 2:29 PM, Matt Sicker <
> boa...@gmail.com>
> > >> > wrote:
> > >> > > > >
> > >> > > > > > I already see a huge difference here: Spark requires a bunch
> > of
> > >> > > > > > infrastructure to be set up, while this library is just a
> > >> library.
> > >> > > > > Similar
> > >> > > > > > to Kafka Streams versus Spark Streaming or Flink or Storm or
> > >> Samza
> > >> > or
> > >> > > > the
> > >> > > > > > others.
> > >> > > > > >
> > >> > > > > > On 12 June 2017 at 16:28, Gary Gregory <
> > garydgreg...@gmail.com>
> > >> > > wrote:
> > >> > > > > >
> > >> > > > > > > On Mon, Jun 12, 2017 at 2:26 PM, Arun Mohan <
> > >> > > strider90a...@gmail.com
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi All,
> > >> > > > > > > >
> > >> > > > > > > > Good afternoon.
> > >> > > > > > > >
> > >> > > > > > > > I have been working on a java generic parallel execution
> > >> > library
> > >> > > > > which
> > >> > > > > > > will
> > >> > > > > > > > allow clients to execute methods in parallel
> irrespective
> > of
> > >> > the
> > >> > > > > number
> > >> > > > > > > of
> > >> > > > > > > > method arguments, type of method arguments, return type
> of
> > >> the
> > >> > > > method
> > >> > > > > > > etc.
> > >> > > > > > > >
> > >> > > > > > > > Here is the link to the source code:
> > >> > > > > > > > https://github.com/striderarun/parallel-
> execution-engine
> > >> > > > > > > >
> > >> > > > > > > > The project is in a nascent state and I am the only
> > >> contributor
> > >> > > so
> > >> > > > > > far. I
> > >> > > > > > > > am new to the Apache community and I would like to bring
> > this
> > >> > > > project
> > >> > > > > > > into
> > >> > > > > > > > Apache and improve, expand and build a developer
> community
> > >> > around
> > >> > > > it.
> > >> > > > > > > >
> > >> > > > > > > > I think this project can be a sub project of Apache
> > Commons
> > >> > since
> > >> > > > it
> > >> > > > > > > > provides generic components for parallelizing any kind
> of
> > >> > > methods.
> > >> > > > > > > >
> > >> > > > > > > > Can somebody please guide me or suggest what other
> > options I
> > >> > can
> > >> > > > > > explore
> > >> > > > > > > ?
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > > Hi Arun,
> > >> > > > > > >
> > >> > > > > > > Thank you for your proposal.
> > >> > > > > > >
> > >> > > > > > > How would this be different from Apache Spark?
> > >> > > > > > >
> > >> > > > > > > Thank you,
> > >> > > > > > > Gary
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > Thanks,
> > >> > > > > > > > Arun
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Matt Sicker <boa...@gmail.com>
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Matt Sicker <boa...@gmail.com>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
>
> --
> Matt Sicker <boa...@gmail.com>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: Commons sub project for parallel method execution

Reply via email to