Re: Interpreting MLLib's linear regression o/p

2014-12-23 Thread Sean Owen
(In your libsvm example, your indices are not ascending.)

The first weight corresponds to the first feature, of course. An
indexing scheme doesn't change that or somehow make the first feature
map to the second (where would the last one go then?). You'll find the
first weight at offset 0 in an array for example, but corresponds to
the feature you called F1 in the input.

On Tue, Dec 23, 2014 at 12:50 AM, Sameer Tilak ssti...@live.com wrote:
 Hi,

 It is a text format in which each line represents a labeled sparse feature
 vector using the following format:

 label index1:value1 index2:value2 ...

 This was the confusing part in the documentation:


 where the indices are one-based and in ascending order. After loading, the
 feature indices are converted to zero-based.


 Let us say that I have 40 features so I create an index file like this:


 Feature, index number:

 F1   1

 F2   2

 F3   3

 ...

 F4   40


 I then create my feature vectors and in the libsvm format something like:

 1 10:1 20:0 8:1 4:0 24:1

 1 1:1 40:0 2:1 8:0 9:1 23:1

 0 23:1 18:0 13:1

 .



 I run regression and get back models.weights which are 40 weights.

 Say I get

 0.11

 0.3445

 0.5

 ...


 In that case does the first weight (0.11) correspond to index 1/ F1 or does
 or correspond to index 2/F2? Since Input is 1-based and o/p is 0-based. Or
 is 0-based indexing is only for internal representation and what you get
 back at the end of regression is essentially 1-based indexed like your input
 so 0.11 maps onto  from F1and so on?




 Date: Mon, 22 Dec 2014 16:31:57 -0800
 Subject: Re: Interpreting MLLib's linear regression o/p
 From: men...@gmail.com
 To: ssti...@live.com
 CC: user@spark.apache.org


 Did you check the indices in the LIBSVM data and the master file? Do
 they match? -Xiangrui

 On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak ssti...@live.com wrote:
  Hi All,
  I use LIBSVM format to specify my input feature vector, which used
  1-based
  index. When I run regression the o/p is 0-indexed based. I have a master
  lookup file that maps back these indices to what they stand or. However,
  I
  need to add offset of 2 and not 1 to the regression outcome during the
  mapping. So for example to map the index of 800 from the regression
  output
  file, I look for 802 in my master lookup file and then things make
  sense. I
  can understand adding offset of 1, but not sure why adding offset 2 is
  working fine. Have others seem something like this as well?
 

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Interpreting MLLib's linear regression o/p

2014-12-22 Thread Xiangrui Meng
Did you check the indices in the LIBSVM data and the master file? Do
they match? -Xiangrui

On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak ssti...@live.com wrote:
 Hi All,
 I use LIBSVM format to specify my input feature vector, which used 1-based
 index. When I run regression the o/p is 0-indexed based. I have a master
 lookup file that maps back these indices to what they stand or. However, I
 need to add offset of 2 and not 1 to the regression outcome during the
 mapping. So for example to map the index of 800 from the regression output
 file, I look for 802 in my master lookup file and then things make sense. I
 can understand adding offset of 1, but not sure why adding offset 2 is
 working fine. Have others seem something like this as well?


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Interpreting MLLib's linear regression o/p

2014-12-22 Thread Sameer Tilak
Hi,It is a text format in which each line represents a labeled sparse feature 
vector using the following format:label index1:value1 index2:value2 ...This was 
the confusing part in the documentation:
where the indices are one-based and in ascending order. After loading, the 
feature indices are converted to zero-based.
Let us say that I have 40 features so I create an index file like this:
Feature, index number:F1   1F2   2F3   3...F4   40
I then create my feature vectors and in the libsvm format something like:1 10:1 
20:0 8:1 4:0 24:11 1:1 40:0 2:1 8:0 9:1 23:10 23:1 18:0 13:1.

I run regression and get back models.weights which are 40 weights.Say I get 
0.110.34450.5...
In that case does the first weight (0.11) correspond to index 1/ F1 or does or 
correspond to index 2/F2? Since Input is 1-based and o/p is 0-based. Or is 
0-based indexing is only for internal representation and what you get back at 
the end of regression is essentially 1-based indexed like your input so 0.11 
maps onto  from F1and so on?


 Date: Mon, 22 Dec 2014 16:31:57 -0800
 Subject: Re: Interpreting MLLib's linear regression o/p
 From: men...@gmail.com
 To: ssti...@live.com
 CC: user@spark.apache.org
 
 Did you check the indices in the LIBSVM data and the master file? Do
 they match? -Xiangrui
 
 On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak ssti...@live.com wrote:
  Hi All,
  I use LIBSVM format to specify my input feature vector, which used 1-based
  index. When I run regression the o/p is 0-indexed based. I have a master
  lookup file that maps back these indices to what they stand or. However, I
  need to add offset of 2 and not 1 to the regression outcome during the
  mapping. So for example to map the index of 800 from the regression output
  file, I look for 802 in my master lookup file and then things make sense. I
  can understand adding offset of 1, but not sure why adding offset 2 is
  working fine. Have others seem something like this as well?
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org