RE: MLLib libsvm format

2014-10-21 Thread Sameer Tilak
Great, I will sort them.


Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone

 Original message From: Xiangrui Meng 
 Date:10/21/2014  3:29 PM  (GMT-08:00) 
To: Sameer Tilak  Cc: 
user@spark.apache.org Subject: Re: MLLib libsvm format 

Yes. "where the indices are one-based and **in ascending order**". -Xiangrui

On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak  wrote:
> Hi All,
>
> I have a question regarding the ordering of indices. The document says that
> the indices indices are one-based and in ascending order. However, do the
> indices within a row need to be sorted in ascending order?
>
>
>
>
> Sparse data
>
> It is very common in practice to have sparse training data. MLlib supports
> reading training examples stored in LIBSVM format, which is the default
> format used by LIBSVM and LIBLINEAR. It is a text format in which each line
> represents a labeled sparse feature vector using the following format:
>
> label index1:value1 index2:value2 ...
>
> where the indices are one-based and in ascending order. After loading, the
> feature indices are converted to zero-based.
>
>
>
> For example, I have have indices ranging rom 1 to 1000 is this as a libsvm
> data file OK?
>
>
> 1110:1.0   80:0.5   310:0.0
>
> 0 890:0.5  20:0.0   200:0.5   400:1.0  82:0.0
>
> and so on:
>
>
> OR do I need to sort them as:
>
>
> 1  80:0.5   110:1.0   310:0.0
>
> 0  20:0.082:0.0200:0.5   400:1.0  890:0.5


Re: MLLib libsvm format

2014-10-21 Thread Xiangrui Meng
Yes. "where the indices are one-based and **in ascending order**". -Xiangrui

On Tue, Oct 21, 2014 at 1:10 PM, Sameer Tilak  wrote:
> Hi All,
>
> I have a question regarding the ordering of indices. The document says that
> the indices indices are one-based and in ascending order. However, do the
> indices within a row need to be sorted in ascending order?
>
>
>
>
> Sparse data
>
> It is very common in practice to have sparse training data. MLlib supports
> reading training examples stored in LIBSVM format, which is the default
> format used by LIBSVM and LIBLINEAR. It is a text format in which each line
> represents a labeled sparse feature vector using the following format:
>
> label index1:value1 index2:value2 ...
>
> where the indices are one-based and in ascending order. After loading, the
> feature indices are converted to zero-based.
>
>
>
> For example, I have have indices ranging rom 1 to 1000 is this as a libsvm
> data file OK?
>
>
> 1110:1.0   80:0.5   310:0.0
>
> 0 890:0.5  20:0.0   200:0.5   400:1.0  82:0.0
>
> and so on:
>
>
> OR do I need to sort them as:
>
>
> 1  80:0.5   110:1.0   310:0.0
>
> 0  20:0.082:0.0200:0.5   400:1.0  890:0.5

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



MLLib libsvm format

2014-10-21 Thread Sameer Tilak
Hi All,I have a question regarding the ordering of indices. The document says 
that the indices indices are one-based and in ascending order. However, do the 
indices within a row need to be sorted in ascending order? 
 Sparse dataIt is very common in practice to have sparse training data. MLlib 
supports reading training examples stored in LIBSVM format, which is the 
default format used by LIBSVM and LIBLINEAR. It is a text format in which each 
line represents a labeled sparse feature vector using the following 
format:label index1:value1 index2:value2 ...
where the indices are one-based and in ascending order. After loading, the 
feature indices are converted to zero-based.

For example, I have have indices ranging rom 1 to 1000 is this as a libsvm data 
file OK?
1110:1.0   80:0.5   310:0.00 890:0.5  20:0.0   200:0.5   400:1.0  
82:0.0 and so on:
OR do I need to sort them as:
1  80:0.5   110:1.0   310:0.00  20:0.082:0.0200:0.5   400:1.0  
890:0.5