[
https://issues.apache.org/jira/browse/MAHOUT-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034845#comment-14034845
]
ASF GitHub Bot commented on MAHOUT-1573:
----------------------------------------
Github user dlyubimov commented on the pull request:
https://github.com/apache/mahout/pull/13#issuecomment-46398143
Ted, are you ready to help with a concrete alternative? This is a very
small issue compared to even the patch, lets build a list of alternatives
and vote. But lets get it done
My additional variants
minSplits,...
minPar, exactPar, autoPar (consitent with scala's collection.par())
To give something to vote down for Ted
>=|| :=||
:||=
Not ok with me
minParts
minParallelism
minPartitions
repartition
reshuffle
and other do-something kind
Your variants--?
On Jun 17, 2014 9:59 PM, "Ted Dunning" <[email protected]> wrote:
> Yes.
>
> But I was talking about the gratuitous use of non-alpha characters.
> Excessive use of operator overloading is also a bit of a problem.
>
> Just because you can doesn't mean you should.
>
> Sent from my iPhone
>
> > On Jun 17, 2014, at 17:58, Dmitriy Lyubimov <[email protected]>
> wrote:
> >
> > e.g. one can write things like A.t.%*%(A).exact_||(100)
> >
> > —
> > Reply to this email directly or view it on GitHub.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/mahout/pull/13#issuecomment-46396097>.
>
> More explicit parallelism adjustments in math-scala DRM apis; elements of
> automatic re-adjustments
> --------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1573
> URL: https://issues.apache.org/jira/browse/MAHOUT-1573
> Project: Mahout
> Issue Type: Task
> Affects Versions: 0.9
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: 1.0
>
>
> (1) add minSplit parameter pass-thru to drmFromHDFS to be able to explicitly
> increase parallelism.
> (2) add parrallelism readjustment parameter to a checkpoint() call. This
> implies shuffle-less coalesce() translation to the data set before it is
> requested to be cached (if specified).
> Going forward, we probably should try and figure how we can automate it, at
> least a little bit. For example, the simplest automatic adjustment might
> include re-adjust parallelims on load to simply fit cluster size (95% or 180%
> of cluster size, for example), with some rule-of-thumb safeguards here, e.g.
> we cannot exceed a factor of say 8 (or whatever we configure) in splitting
> each original hdfs split. We should be able to get a reasonable parallelism
> performance out of the box on simple heuristics like that.
--
This message was sent by Atlassian JIRA
(v6.2#6252)