[ https://issues.apache.org/jira/browse/FLINK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chiwan Park reassigned FLINK-2984: ---------------------------------- Assignee: Chiwan Park > Support lenient parsing of SVMLight input files > ----------------------------------------------- > > Key: FLINK-2984 > URL: https://issues.apache.org/jira/browse/FLINK-2984 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Affects Versions: 0.9.1 > Reporter: Johannes > Assignee: Chiwan Park > Priority: Trivial > Labels: easyfix > > The current implementation for the reader assumes that the format follows the > exact specification. > The [splice-site Dataset| > https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#splice-site] > dataset is formatted slightly different > Example > {noformat} > -1 1:0.381846 2:0.163648 3:0.245472 4:0.627318 > {noformat} > note the two spaces after the label. > Currently MLUtils.scala splits on single spaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)