Re: Interpreting MLLib's linear regression o/p
(In your libsvm example, your indices are not ascending.) The first weight corresponds to the first feature, of course. An indexing scheme doesn't change that or somehow make the first feature map to the second (where would the last one go then?). You'll find the first weight at offset 0 in an array for example, but corresponds to the feature you called F1 in the input. On Tue, Dec 23, 2014 at 12:50 AM, Sameer Tilak ssti...@live.com wrote: Hi, It is a text format in which each line represents a labeled sparse feature vector using the following format: label index1:value1 index2:value2 ... This was the confusing part in the documentation: where the indices are one-based and in ascending order. After loading, the feature indices are converted to zero-based. Let us say that I have 40 features so I create an index file like this: Feature, index number: F1 1 F2 2 F3 3 ... F4 40 I then create my feature vectors and in the libsvm format something like: 1 10:1 20:0 8:1 4:0 24:1 1 1:1 40:0 2:1 8:0 9:1 23:1 0 23:1 18:0 13:1 . I run regression and get back models.weights which are 40 weights. Say I get 0.11 0.3445 0.5 ... In that case does the first weight (0.11) correspond to index 1/ F1 or does or correspond to index 2/F2? Since Input is 1-based and o/p is 0-based. Or is 0-based indexing is only for internal representation and what you get back at the end of regression is essentially 1-based indexed like your input so 0.11 maps onto from F1and so on? Date: Mon, 22 Dec 2014 16:31:57 -0800 Subject: Re: Interpreting MLLib's linear regression o/p From: men...@gmail.com To: ssti...@live.com CC: user@spark.apache.org Did you check the indices in the LIBSVM data and the master file? Do they match? -Xiangrui On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak ssti...@live.com wrote: Hi All, I use LIBSVM format to specify my input feature vector, which used 1-based index. When I run regression the o/p is 0-indexed based. I have a master lookup file that maps back these indices to what they stand or. However, I need to add offset of 2 and not 1 to the regression outcome during the mapping. So for example to map the index of 800 from the regression output file, I look for 802 in my master lookup file and then things make sense. I can understand adding offset of 1, but not sure why adding offset 2 is working fine. Have others seem something like this as well? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Interpreting MLLib's linear regression o/p
Did you check the indices in the LIBSVM data and the master file? Do they match? -Xiangrui On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak ssti...@live.com wrote: Hi All, I use LIBSVM format to specify my input feature vector, which used 1-based index. When I run regression the o/p is 0-indexed based. I have a master lookup file that maps back these indices to what they stand or. However, I need to add offset of 2 and not 1 to the regression outcome during the mapping. So for example to map the index of 800 from the regression output file, I look for 802 in my master lookup file and then things make sense. I can understand adding offset of 1, but not sure why adding offset 2 is working fine. Have others seem something like this as well? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Interpreting MLLib's linear regression o/p
Hi,It is a text format in which each line represents a labeled sparse feature vector using the following format:label index1:value1 index2:value2 ...This was the confusing part in the documentation: where the indices are one-based and in ascending order. After loading, the feature indices are converted to zero-based. Let us say that I have 40 features so I create an index file like this: Feature, index number:F1 1F2 2F3 3...F4 40 I then create my feature vectors and in the libsvm format something like:1 10:1 20:0 8:1 4:0 24:11 1:1 40:0 2:1 8:0 9:1 23:10 23:1 18:0 13:1. I run regression and get back models.weights which are 40 weights.Say I get 0.110.34450.5... In that case does the first weight (0.11) correspond to index 1/ F1 or does or correspond to index 2/F2? Since Input is 1-based and o/p is 0-based. Or is 0-based indexing is only for internal representation and what you get back at the end of regression is essentially 1-based indexed like your input so 0.11 maps onto from F1and so on? Date: Mon, 22 Dec 2014 16:31:57 -0800 Subject: Re: Interpreting MLLib's linear regression o/p From: men...@gmail.com To: ssti...@live.com CC: user@spark.apache.org Did you check the indices in the LIBSVM data and the master file? Do they match? -Xiangrui On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak ssti...@live.com wrote: Hi All, I use LIBSVM format to specify my input feature vector, which used 1-based index. When I run regression the o/p is 0-indexed based. I have a master lookup file that maps back these indices to what they stand or. However, I need to add offset of 2 and not 1 to the regression outcome during the mapping. So for example to map the index of 800 from the regression output file, I look for 802 in my master lookup file and then things make sense. I can understand adding offset of 1, but not sure why adding offset 2 is working fine. Have others seem something like this as well? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org