[ https://issues.apache.org/jira/browse/SPARK-12208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-12208: --------------------------------- Labels: bulk-closed (was: ) > Abstract the examples into a common place > ----------------------------------------- > > Key: SPARK-12208 > URL: https://issues.apache.org/jira/browse/SPARK-12208 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib > Affects Versions: 1.5.2 > Reporter: Timothy Hunter > Priority: Minor > Labels: bulk-closed > > When we write examples in the code, we put the generation of the data along > with the example itself. We typically have either: > {code} > val data = > sqlContext.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") > ... > {code} > or some more esoteric stuff such as: > {code} > val data = Array( > (0, 0.1), > (1, 0.8), > (2, 0.2) > ) > val dataFrame: DataFrame = sqlContext.createDataFrame(data).toDF("label", > "feature") > {code} > {code} > val data = Array( > Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))), > Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0), > Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0) > ) > val df = sqlContext.createDataFrame(data.map(Tuple1.apply)).toDF("features") > {code} > I suggest we follow the example of sklearn and standardize all the generation > of example data inside a few methods, for example in > {{org.apache.spark.ml.examples.ExampleData}}. One reason is that just reading > the code is sometimes not enough to figure out what the data is supposed to > be. For example when using {{libsvm_data}}, it is unclear what the dataframe > columns are. This is something we should comment somewhere. > Also, it would help explaining in one place all the scala idiosyncracies such > as using {{Tuple1.apply}} and such. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org