RE: MLLib libsvm format
Great, I will sort them. Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone Original message From: Xiangrui Meng Date:10/21/2014 3:29 PM (GMT-08:00) To: Sameer Tilak Cc: user@spark.apache.org Subject: Re: MLLib libsvm format Yes. "where the indices are one-based and **in ascending order**". -Xiangrui On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak wrote: > Hi All, > > I have a question regarding the ordering of indices. The document says that > the indices indices are one-based and in ascending order. However, do the > indices within a row need to be sorted in ascending order? > > > > > Sparse data > > It is very common in practice to have sparse training data. MLlib supports > reading training examples stored in LIBSVM format, which is the default > format used by LIBSVM and LIBLINEAR. It is a text format in which each line > represents a labeled sparse feature vector using the following format: > > label index1:value1 index2:value2 ... > > where the indices are one-based and in ascending order. After loading, the > feature indices are converted to zero-based. > > > > For example, I have have indices ranging rom 1 to 1000 is this as a libsvm > data file OK? > > > 1110:1.0 80:0.5 310:0.0 > > 0 890:0.5 20:0.0 200:0.5 400:1.0 82:0.0 > > and so on: > > > OR do I need to sort them as: > > > 1 80:0.5 110:1.0 310:0.0 > > 0 20:0.082:0.0200:0.5 400:1.0 890:0.5
Re: MLLib libsvm format
Yes. "where the indices are one-based and **in ascending order**". -Xiangrui On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak wrote: > Hi All, > > I have a question regarding the ordering of indices. The document says that > the indices indices are one-based and in ascending order. However, do the > indices within a row need to be sorted in ascending order? > > > > > Sparse data > > It is very common in practice to have sparse training data. MLlib supports > reading training examples stored in LIBSVM format, which is the default > format used by LIBSVM and LIBLINEAR. It is a text format in which each line > represents a labeled sparse feature vector using the following format: > > label index1:value1 index2:value2 ... > > where the indices are one-based and in ascending order. After loading, the > feature indices are converted to zero-based. > > > > For example, I have have indices ranging rom 1 to 1000 is this as a libsvm > data file OK? > > > 1110:1.0 80:0.5 310:0.0 > > 0 890:0.5 20:0.0 200:0.5 400:1.0 82:0.0 > > and so on: > > > OR do I need to sort them as: > > > 1 80:0.5 110:1.0 310:0.0 > > 0 20:0.082:0.0200:0.5 400:1.0 890:0.5 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
MLLib libsvm format
Hi All,I have a question regarding the ordering of indices. The document says that the indices indices are one-based and in ascending order. However, do the indices within a row need to be sorted in ascending order? Sparse dataIt is very common in practice to have sparse training data. MLlib supports reading training examples stored in LIBSVM format, which is the default format used by LIBSVM and LIBLINEAR. It is a text format in which each line represents a labeled sparse feature vector using the following format:label index1:value1 index2:value2 ... where the indices are one-based and in ascending order. After loading, the feature indices are converted to zero-based. For example, I have have indices ranging rom 1 to 1000 is this as a libsvm data file OK? 1110:1.0 80:0.5 310:0.00 890:0.5 20:0.0 200:0.5 400:1.0 82:0.0 and so on: OR do I need to sort them as: 1 80:0.5 110:1.0 310:0.00 20:0.082:0.0200:0.5 400:1.0 890:0.5