MLLib sample data format

2014-06-22 Thread Justin Yip
Hello, I am looking into a couple of MLLib data files in https://github.com/apache/spark/tree/master/data/mllib. But I cannot find any explanation for these files? Does anyone know if they are documented? Thanks. Justin

Re: MLLib sample data format

2014-06-22 Thread Justin Yip
Hi Shuo, Yes. I was reading the guide as well as the sample code. For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, nowhere in the github repository I can find the file: sc.textFile( mllib/data/ridge-data/lpsa.data). Thanks. Justin

Re: MLLib sample data format

2014-06-22 Thread Justin Yip
Hi Shuo, Yes. I was reading the guide as well as the sample code. For example, in http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm, now where in the github repository I can find the file: sc.textFile( mllib/data/ridge-data/lpsa.data). Thanks.

Re: MLLib sample data format

2014-06-22 Thread Evan Sparks
These files follow the libsvm format where each line is a record, the first column is a label, and then after that the fields are offset:value where offset is the offset into the feature vector, and value is the value of the input feature. This is a fairly efficient representation for sparse

Re: MLLib sample data format

2014-06-22 Thread Justin Yip
I see. That's good. Thanks. Justin On Sun, Jun 22, 2014 at 4:59 PM, Evan Sparks evan.spa...@gmail.com wrote: Oh, and the movie lens one is userid::movieid::rating - Evan On Jun 22, 2014, at 3:35 PM, Justin Yip yipjus...@gmail.com wrote: Hello, I am looking into a couple of MLLib data