2014-03-19 1:15 GMT+01:00 Anitha Gollamudi <[email protected]>:
> Looks like the value error is haunting me still. I am trying to load a
> multi-label libSVM format data file (sample pasted below) as:
>
> X_train, y_train = load_svmlight_file("testtrain.txt", dtype=np.int32,
> multilabel=True)
>
> which gives me at-least 2 issues which confuse me. Help me here.
>
> [1].  ValueError: empty string for float()
>
> I did specify "np.int32". [Note that numpy is imported as 'np']. Yet,
> it seemed to expect float for feature values. The error comes even if
> I remove the dtype altogether.

Yes, it uses float to load, then converts afterward. But this is not
the problem you're encountering: you have spaces in between the
labels, which is not allowed at present.

I'm not sure if other packages allow these spaces; if either LibSVM or
SVMlight does, we might have to extend the loader to deal with them.

> [2].  ValueError: Feature indices in SVMlight/LibSVM data file should
> be sorted and unique.
>
> Is there a limitation as such? Because my dataset is from LSHTC
> (http://lshtc.iit.demokritos.gr/LSHTC4_GUIDELINES) and the website
> specifically mentions it to be in libSVM format.

There is, AFAIK, no formal definition of the LibSVM file format, so
the loader emulates what LibSVM and SVMlight do, and I'm guessing this
was implemented for compatibility reasons.

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to