On 28 March 2014 08:59, Lars Buitinck <[email protected]> wrote:
> 2014-03-19 1:15 GMT+01:00 Anitha Gollamudi <[email protected]>:
>> Looks like the value error is haunting me still. I am trying to load a
>> multi-label libSVM format data file (sample pasted below) as:
>>
>> X_train, y_train = load_svmlight_file("testtrain.txt", dtype=np.int32,
>> multilabel=True)
>>
>> which gives me at-least 2 issues which confuse me. Help me here.
>>
>> [1].  ValueError: empty string for float()
>>
>> I did specify "np.int32". [Note that numpy is imported as 'np']. Yet,
>> it seemed to expect float for feature values. The error comes even if
>> I remove the dtype altogether.
>
> Yes, it uses float to load, then converts afterward. But this is not
> the problem you're encountering: you have spaces in between the
> labels, which is not allowed at present.
>
> I'm not sure if other packages allow these spaces; if either LibSVM or
> SVMlight does, we might have to extend the loader to deal with them.
>

Yes, space is the issue. I removed them and it worked.


>> [2].  ValueError: Feature indices in SVMlight/LibSVM data file should
>> be sorted and unique.
>>
>> Is there a limitation as such? Because my dataset is from LSHTC
>> (http://lshtc.iit.demokritos.gr/LSHTC4_GUIDELINES) and the website
>> specifically mentions it to be in libSVM format.
>
> There is, AFAIK, no formal definition of the LibSVM file format, so
> the loader emulates what LibSVM and SVMlight do, and I'm guessing this
> was implemented for compatibility reasons.


I am not sure. But I had to sort the input to make it work :-(

Thanks!

-Anitha

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to