Interesting idea. And great discussion. Can't really say I'd have a use case for that right now, so abstaining from the discussion around the implementation.
I believe if we decide to explore this idea in Commons, we will probably move it to sandbox? Even if we do not move that to Commons or to sandbox, I intend to find some time in the next days to try Apache Commons Javaflow with this library. Jenkins implemented pipelines + continuations with code that when started it looked a lot like Javaflow. The execution in parallel is taken care in some internal modules in Jenkins, but I would like to see how if simpler implementation like this one would work. Ideally, this utility would execute in parallel, say, 20 tasks each taking 5 minutes (haven't looked if it supports fork/join). Then I would be able to have checkpoints during the execution and if the whole workflow fails, I would be able to restart it from the last checkpoint. I use Java7+ concurrent classes when I need to execute tasks in parallel (though I'm adding a flag to Paul King's message in this thread to give GPars a try too!), but I am unaware of any way to have persistentable (?) continuation workflows as in Jenkins, but with simple Java code. Cheers Bruno ________________________________ From: Gary Gregory <garydgreg...@gmail.com> To: Commons Developers List <dev@commons.apache.org> Sent: Tuesday, 13 June 2017 2:08 PM Subject: Re: Commons sub project for parallel method execution On Mon, Jun 12, 2017 at 6:56 PM, Matt Sicker <boa...@gmail.com> wrote: > So wouldn't something like ASM or Javassist or one of the zillion other > bytecode libraries be a better alternative to using reflection for > performance? Also, using the Java 7 reflections API improvements helps > speed things up quite a bit. > IMO, unless you are doing scripting, reflection should be a used as a workaround, but that's just me. For example, like we do in Commons IO's Java7Support class. But I digress ;-) This is clearly an interesting topic. My concern is that there is a LOT of code out there that does stuff like this at the low and high level from the JRE's fork/join to Apache Spark and so on as I've stated. IMO something new would have to be both unique and since this is Commons, potentially pluggable into other frameworks. Gary > On 12 June 2017 at 20:37, Paul King <paul.king.as...@gmail.com> wrote: > > > My goto library for such tasks would be GPars. It has both Java and > > Groovy support for most things (actors/dataflow) but less so for > > asynchronous task execution. It's one of the things that would be good > > to explore in light of Java 8. Groovy is now Apache, GPars not at this > > stage. > > > > So with adding two jars (GPars + Groovy), you can use Groovy like this: > > > > @Grab('org.codehaus.gpars:gpars:1.2.1') > > import com.arun.student.StudentService > > import groovyx.gpars.GParsExecutorsPool > > > > long startTime = System.nanoTime() > > def service = new StudentService() > > def bookSeries = ["A Song of Ice and Fire": 7, "Wheel of Time": 14, > > "Harry Potter": 7] > > > > def tasks = [ > > { println service.findStudent("j...@gmail.com", 11, false) }, > > { println service.getStudentMarks(1L) }, > > { println service.getStudentsByFirstNames(["John","Alice"]) }, > > { println service.getRandomLastName() }, > > { println service.findStudentIdByName("Kate", "Williams") }, > > { service.printMapValues(bookSeries) } > > ] > > > > GParsExecutorsPool.withPool { > > tasks.collect{ it.callAsync() }.collect{ it.get() } > > // tasks.eachParallel{ it() } // one of numerous alternatives > > } > > > > long executionTime = (System.nanoTime() - startTime) / 1000000 > > println "\nTotal elapsed time is $executionTime\n\n" > > > > > > Cheers, Paul. > > > > > > On Tue, Jun 13, 2017 at 9:29 AM, Matt Sicker <boa...@gmail.com> wrote: > > > I'd be interested to see where this leads to. It could end up as a sort > > of > > > Commons Parallel library. Besides providing an execution API, there > could > > > be plenty of support utilities that tend to be found in all the > > > *Util(s)/*Helper classes in projects like all the ones I mentioned > > earlier > > > (basically all sorts of Hadoop-related projects and other distributed > > > systems here). > > > > > > Really, there's so many ways that such a project could head, I'd like > to > > > hear more ideas on what to focus on. > > > > > > On 12 June 2017 at 18:19, Gary Gregory <garydgreg...@gmail.com> wrote: > > > > > >> The upshot is that there has to be a way to do this with some custom > > code > > >> to at least have the ability to 'fast path' the code without > reflection. > > >> Using lambdas should make this fairly syntactically unobtrusive. > > >> > > >> On Mon, Jun 12, 2017 at 4:02 PM, Arun Mohan <strider90a...@gmail.com> > > >> wrote: > > >> > > >> > Yes, reflection is not very performant but I don't think I have any > > other > > >> > choice since the library has to inspect the object supplied by the > > client > > >> > at runtime to pick out the methods to be invoked using > > CompletableFuture. > > >> > But the performance penalty paid for using reflection will be more > > than > > >> > offset by the savings of parallel method execution, more so as the > no > > of > > >> > methods executed in parallel increases. > > >> > > > >> > On Mon, Jun 12, 2017 at 3:21 PM, Gary Gregory < > garydgreg...@gmail.com > > > > > >> > wrote: > > >> > > > >> > > On a lower-level, if you want to use this for lower-level services > > >> (where > > >> > > there is no network latency for example), you will need to avoid > > using > > >> > > reflection to get the best performance. > > >> > > > > >> > > Gary > > >> > > > > >> > > On Mon, Jun 12, 2017 at 3:15 PM, Arun Mohan < > > strider90a...@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > Hi Gary, > > >> > > > > > >> > > > Thanks for your response. You have some valid and interesting > > points > > >> > :-) > > >> > > > Of course you are right that Spark is much more mature. Thanks > for > > >> your > > >> > > > insight. > > >> > > > It will be interesting indeed to find out if the core > > parallelization > > >> > > > engine of Spark can be isolated like you suggest. > > >> > > > > > >> > > > I started working on this project because I felt that there was > no > > >> good > > >> > > > library for parallelizing method calls which can be plugged in > > easily > > >> > > into > > >> > > > an existing java project. Ultimately, if such a solution can be > > >> > > > incorporated in the Apache Commons, it would be a useful > addition > > to > > >> > the > > >> > > > Commons repository. > > >> > > > > > >> > > > Thanks, > > >> > > > Arun > > >> > > > > > >> > > > > > >> > > > > > >> > > > On Mon, Jun 12, 2017 at 3:01 PM, Gary Gregory < > > >> garydgreg...@gmail.com> > > >> > > > wrote: > > >> > > > > > >> > > > > Hi Arun, > > >> > > > > > > >> > > > > Sure, and that is to be expected, Spark is more mature than a > > four > > >> > > class > > >> > > > > prototype. What I am trying to get to is that in order for the > > >> > library > > >> > > to > > >> > > > > be useful, you will end up with more in a first release, and > > after > > >> a > > >> > > > couple > > >> > > > > more releases, there will be more and more. Would Spark not > > have in > > >> > its > > >> > > > > guts the same kind of code your are proposing here? By > > extension, > > >> > will > > >> > > > you > > >> > > > > not end up with more framework-like (Spark-like) code and > > solutions > > >> > as > > >> > > > > found in Spark? I am just playing devil's advocate here ;-) > > >> > > > > > > >> > > > > > > >> > > > > What would be interesting would be to find out if there is a > > core > > >> > part > > >> > > of > > >> > > > > Spark that is separable and ex tractable into a Commons > > component. > > >> > > Since > > >> > > > > Spark has a proven track record, it is more likely, that such > a > > >> > library > > >> > > > > would be generally useful than one created from scratch that > > does > > >> not > > >> > > > > integrate with anything else. Again, please do not take any of > > this > > >> > > > > personally, I am just playing here :-) > > >> > > > > > > >> > > > > Gary > > >> > > > > > > >> > > > > > > >> > > > > On Mon, Jun 12, 2017 at 2:29 PM, Matt Sicker < > boa...@gmail.com> > > >> > wrote: > > >> > > > > > > >> > > > > > I already see a huge difference here: Spark requires a bunch > > of > > >> > > > > > infrastructure to be set up, while this library is just a > > >> library. > > >> > > > > Similar > > >> > > > > > to Kafka Streams versus Spark Streaming or Flink or Storm or > > >> Samza > > >> > or > > >> > > > the > > >> > > > > > others. > > >> > > > > > > > >> > > > > > On 12 June 2017 at 16:28, Gary Gregory < > > garydgreg...@gmail.com> > > >> > > wrote: > > >> > > > > > > > >> > > > > > > On Mon, Jun 12, 2017 at 2:26 PM, Arun Mohan < > > >> > > strider90a...@gmail.com > > >> > > > > > > >> > > > > > > wrote: > > >> > > > > > > > > >> > > > > > > > Hi All, > > >> > > > > > > > > > >> > > > > > > > Good afternoon. > > >> > > > > > > > > > >> > > > > > > > I have been working on a java generic parallel execution > > >> > library > > >> > > > > which > > >> > > > > > > will > > >> > > > > > > > allow clients to execute methods in parallel > irrespective > > of > > >> > the > > >> > > > > number > > >> > > > > > > of > > >> > > > > > > > method arguments, type of method arguments, return type > of > > >> the > > >> > > > method > > >> > > > > > > etc. > > >> > > > > > > > > > >> > > > > > > > Here is the link to the source code: > > >> > > > > > > > https://github.com/striderarun/parallel- > execution-engine > > >> > > > > > > > > > >> > > > > > > > The project is in a nascent state and I am the only > > >> contributor > > >> > > so > > >> > > > > > far. I > > >> > > > > > > > am new to the Apache community and I would like to bring > > this > > >> > > > project > > >> > > > > > > into > > >> > > > > > > > Apache and improve, expand and build a developer > community > > >> > around > > >> > > > it. > > >> > > > > > > > > > >> > > > > > > > I think this project can be a sub project of Apache > > Commons > > >> > since > > >> > > > it > > >> > > > > > > > provides generic components for parallelizing any kind > of > > >> > > methods. > > >> > > > > > > > > > >> > > > > > > > Can somebody please guide me or suggest what other > > options I > > >> > can > > >> > > > > > explore > > >> > > > > > > ? > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > Hi Arun, > > >> > > > > > > > > >> > > > > > > Thank you for your proposal. > > >> > > > > > > > > >> > > > > > > How would this be different from Apache Spark? > > >> > > > > > > > > >> > > > > > > Thank you, > > >> > > > > > > Gary > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > > >> > > > > > > > Thanks, > > >> > > > > > > > Arun > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > -- > > >> > > > > > Matt Sicker <boa...@gmail.com> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > > > > > -- > > > Matt Sicker <boa...@gmail.com> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > -- > Matt Sicker <boa...@gmail.com> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org