[ https://issues.apache.org/jira/browse/SPARK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750435#comment-15750435 ]
Yanbo Liang commented on SPARK-18862: ------------------------------------- Great! I found other R packages organize source files in flat structure, so with a bit worried that R can not support subdirectory. Thanks for your reference, it's very helpful. To the naming, I think {{ml}} is not an official name, we still use {{mllib}} for public, see [here|https://github.com/apache/spark/pull/16241/files]. I think grouping by algorithm of family is very reasonable, so I would like to use the name {{mllib-glm.R, mllib-gbt.R, mllib-randomForest.R, etc}}, what do you think of it? Thanks. > Split SparkR mllib.R into multiple files > ---------------------------------------- > > Key: SPARK-18862 > URL: https://issues.apache.org/jira/browse/SPARK-18862 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR > Reporter: Yanbo Liang > > SparkR mllib.R is getting bigger as we add more ML wrappers, I'd like to > split it into multiple files to make us easy to maintain: > * mllibClassification.R > * mllibRegression.R > * mllibClustering.R > * mllibFeature.R > or: > * mllib/classification.R > * mllib/regression.R > * mllib/clustering.R > * mllib/features.R > For R convention, it's more prefer the first way. And I'm not sure whether R > supports the second organized way (will check later). Please let me know your > preference. I think the start of a new release cycle is a good opportunity to > do this, since it will involves less conflicts. If this proposal was > approved, I can work on it. > cc [~felixcheung] [~josephkb] [~mengxr] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org