Johannes created FLINK-2984: ------------------------------- Summary: Support lenient parsing of SVMLight input files Key: FLINK-2984 URL: https://issues.apache.org/jira/browse/FLINK-2984 Project: Flink Issue Type: Improvement Components: Machine Learning Library Affects Versions: 0.9.1 Reporter: Johannes Priority: Trivial
The current implementation for the reader assumes that the format follows the exact specification. The [splice-site Dataset| https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#splice-site] dataset is formatted slightly different Example {noformat} -1 1:0.381846 2:0.163648 3:0.245472 4:0.627318 {noformat} note the two spaces after the label. Currently MLUtils.scala splits on single spaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)