[ https://issues.apache.org/jira/browse/IGNITE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Zinoviev updated IGNITE-7328: ------------------------------------- Affects Version/s: 3.0 > Improve Labeled Dataset loading from txt file > --------------------------------------------- > > Key: IGNITE-7328 > URL: https://issues.apache.org/jira/browse/IGNITE-7328 > Project: Ignite > Issue Type: New Feature > Components: ml > Affects Versions: 3.0 > Reporter: Aleksey Zinoviev > Assignee: Aleksey Zinoviev > Priority: Trivial > > 1. Wouldn't it be better to parse rows in-place (not to save them as strings > at first)? In current implementation we will be needed to keep a dataset in > memory twice and it might be a problem for big datasets. > 2. What about the case when a dataset contains not only a numerical data? Do > we consider this case or for such purposes some other "DatasetLoader" will be > used? > 3. Just an idea, in case we don't want to fall on bad data (99% of cases) > would be great to understand the quality of loaded dataset such as number of > missed rows/values. > 4. Does a situation when a row doesn't contain required number of columns > should be considered as "bad data" and don't break parsing with > IndexOutOfBoundException? -- This message was sent by Atlassian Jira (v8.3.4#803005)