Hello,
I am looking into a couple of MLLib data files in
https://github.com/apache/spark/tree/master/data/mllib. But I cannot find
any explanation for these files? Does anyone know if they are documented?
Thanks.
Justin
Hi Shuo,
Yes. I was reading the guide as well as the sample code.
For example, in
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm,
nowhere in the github repository I can find the file: sc.textFile(
mllib/data/ridge-data/lpsa.data).
Thanks.
Justin
Hi Shuo,
Yes. I was reading the guide as well as the sample code.
For example, in
http://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machine-svm,
now where in the github repository I can find the file: sc.textFile(
mllib/data/ridge-data/lpsa.data).
Thanks.
These files follow the libsvm format where each line is a record, the first
column is a label, and then after that the fields are offset:value where offset
is the offset into the feature vector, and value is the value of the input
feature.
This is a fairly efficient representation for sparse
I see. That's good. Thanks.
Justin
On Sun, Jun 22, 2014 at 4:59 PM, Evan Sparks evan.spa...@gmail.com wrote:
Oh, and the movie lens one is userid::movieid::rating
- Evan
On Jun 22, 2014, at 3:35 PM, Justin Yip yipjus...@gmail.com wrote:
Hello,
I am looking into a couple of MLLib data