I found some more issues related to parfor and opened a couple of jiras. Someone can assign them to me, I will work on it in!
Felix On 22.11.2016 17:54, dusenberr...@gmail.com wrote: > Also for some context, we're aiming to use this for remote hyperparameter > tuning over a large dataset. Specifically, each remote process would train a > separate model over the full dataset using a mini-batch SGD approach. Has > the `parfor` construct been used for this purpose before? > > -- > > Mike Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > > > On Nov 22, 2016, at 2:01 PM, Matthias Boehm <mboe...@googlemail.com> wrote: > > > > that's a good catch - thanks Felix. It would be great if you could modify > > rewriteSetExecutionStategy and rewriteSetFusedDataPartitioningExecution in > > OptimizerConstrained to handle the respective Spark execution types. Thanks. > > > > Regards, > > Matthias > > > >> On 11/22/2016 7:54 PM, fschue...@posteo.de wrote: > >> The constrained optimizer doesn't seem to know about a REMOTE_SPARK > >> execution mode and either sets CP or REMOTE_MR. I can open a jira for > >> that and provide a fix. > >> > >> Felix > >> > >> Am 22.11.2016 02:07 schrieb Matthias Boehm: > >>> yes, this came up several times - initially we only supported opt=NONE > >>> where users had to specify all other parameters. Meanwhile, there is a > >>> so-called "constrained optimizer" that does the same as the rule-based > >>> optimizer but respects any given parameters. Please try something like > >>> this: > >>> > >>> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) { > >>> // some code here > >>> } > >>> > >>> > >>> Regards, > >>> Matthias > >>> > >>>> On 11/22/2016 12:33 AM, fschue...@posteo.de wrote: > >>>> While debugging some ParFor code it became clear that the parameters for > >>>> parfor can be easily overwritten by the optimizer. > >>>> One example is when I write: > >>>> > >>>> ``` > >>>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) { > >>>> // some code here > >>>> } > >>>> ``` > >>>> > >>>> Depending on the data size and cluster resources, the optimizer > >>>> (OptimizerRuleBased.java, line 844) will recognize that the work can be > >>>> done locally and overwrite it to local execution. This might be valid > >>>> and definitely works (in my case) but kind of contradicts what I want > >>>> SystemML to do. > >>>> I wonder if we should disable this optimization in case a concrete > >>>> execution mode is given and go with the mode that is provided. > >>>> > >>>> Felix > >>>> > >>>> > >> > >> >