On 28 March 2014 08:59, Lars Buitinck <[email protected]> wrote: > 2014-03-19 1:15 GMT+01:00 Anitha Gollamudi <[email protected]>: >> Looks like the value error is haunting me still. I am trying to load a >> multi-label libSVM format data file (sample pasted below) as: >> >> X_train, y_train = load_svmlight_file("testtrain.txt", dtype=np.int32, >> multilabel=True) >> >> which gives me at-least 2 issues which confuse me. Help me here. >> >> [1]. ValueError: empty string for float() >> >> I did specify "np.int32". [Note that numpy is imported as 'np']. Yet, >> it seemed to expect float for feature values. The error comes even if >> I remove the dtype altogether. > > Yes, it uses float to load, then converts afterward. But this is not > the problem you're encountering: you have spaces in between the > labels, which is not allowed at present. > > I'm not sure if other packages allow these spaces; if either LibSVM or > SVMlight does, we might have to extend the loader to deal with them. >
Yes, space is the issue. I removed them and it worked. >> [2]. ValueError: Feature indices in SVMlight/LibSVM data file should >> be sorted and unique. >> >> Is there a limitation as such? Because my dataset is from LSHTC >> (http://lshtc.iit.demokritos.gr/LSHTC4_GUIDELINES) and the website >> specifically mentions it to be in libSVM format. > > There is, AFAIK, no formal definition of the LibSVM file format, so > the loader emulates what LibSVM and SVMlight do, and I'm guessing this > was implemented for compatibility reasons. I am not sure. But I had to sort the input to make it work :-( Thanks! -Anitha ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
