[ 
https://issues.apache.org/jira/browse/SPARK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749559#comment-15749559
 ] 

Felix Cheung edited comment on SPARK-18862 at 12/14/16 9:38 PM:
----------------------------------------------------------------

AFAIK, R package has a constrain that it has to be a flat structure, so I don't 
think subdirectory would work. (search for "directory" in 
http://r-pkgs.had.co.nz/r.html)

My preference would be ml- or ml_
I think we should call it ml instead of mllib to match spark.ml.

Also perhaps it make sense to group by algorithm or family of in some cases 
(eg. trees for random forest, GBT) instead of breaking it into classification 
and regression since they are so similar and also sharing helper functions.



was (Author: felixcheung):
AFAIK, R package has a constrain that it has to be a flat structure, so I don't 
think subdirectory would work. (search for "directory" in 
http://r-pkgs.had.co.nz/r.html)

My preference would be ml- or ml_
I think we should call it ml instead of mllib to match spark.ml.

Also perhaps it make sense to group by algorithm in some cases (eg. random 
forest, GBT) instead of breaking it into classification and regression since 
they are so similar.


> Split SparkR mllib.R into multiple files
> ----------------------------------------
>
>                 Key: SPARK-18862
>                 URL: https://issues.apache.org/jira/browse/SPARK-18862
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, SparkR
>            Reporter: Yanbo Liang
>
> SparkR mllib.R is getting bigger as we add more ML wrappers, I'd like to 
> split it into multiple files to make us easy to maintain:
> * mllibClassification.R
> * mllibRegression.R
> * mllibClustering.R
> * mllibFeature.R
> or:
> * mllib/classification.R
> * mllib/regression.R
> * mllib/clustering.R
> * mllib/features.R
> For R convention, it's more prefer the first way. And I'm not sure whether R 
> supports the second organized way (will check later). Please let me know your 
> preference. I think the start of a new release cycle is a good opportunity to 
> do this, since it will involves less conflicts. If this proposal was 
> approved, I can work on it.
> cc [~felixcheung] [~josephkb] [~mengxr] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to