[jira] [Commented] (SDAP-151) Determine parallelism automatically for Spark analytics

ASF GitHub Bot (JIRA) Sun, 04 Nov 2018 18:27:08 -0800


    [ 
https://issues.apache.org/jira/browse/SDAP-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674604#comment-16674604
 ]


ASF GitHub Bot commented on SDAP-151:
-------------------------------------

jjacob7734 opened a new pull request #50: SDAP-151 Determine parallelism 
automatically for Spark analytics
URL: https://github.com/apache/incubator-sdap-nexus/pull/50
 
 
   The built-in NEXUS analytics timeSeriesSpark, timeAvgMapSpark, corrMapSpark, 
and climMapSpark got the desired parallelism from a job request parameter like 
"spark=mesos,16,32".  If that was omitted, we defaulted to "spark=local,1,1", 
which runs on a single core.  The new algorithms automatically determine the 
appropriate level of parallelism based on the job's Spark cluster 
configuration.  The job parameter called "spark" is no longer supported.  A new 
optional job parameter called "nparts" can be used to explicitly set the number 
of data partitions (e.g., "nparts=16").

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Determine parallelism automatically for Spark analytics
> -------------------------------------------------------
>
>                 Key: SDAP-151
>                 URL: https://issues.apache.org/jira/browse/SDAP-151
>             Project: Apache Science Data Analytics Platform
>          Issue Type: Improvement
>            Reporter: Joseph Jacob
>            Assignee: Joseph Jacob
>            Priority: Major
>
> Some of the built-in NEXUS analytics like TimeSeries and TimeAvgMap currently 
> get the desired parallelism from a job request parameter like 
> "spark=mesos,16,32".  If that is omitted, we currently default to 
> "spark=local,1,1", which runs on a single core.  Instead we would like to 
> automatically determine the appropriate level of parallelism based on the 
> job's input data size.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (SDAP-151) Determine parallelism automatically for Spark analytics

Reply via email to