[
https://issues.apache.org/jira/browse/MAHOUT-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022121#comment-13022121
]
Sean Owen commented on MAHOUT-677:
----------------------------------
I can optimize the parsing about 10x more: don't parse from text! My question
is whether the use cases this approach is for are simply better suited to
binary input, since it is even faster, results in less I/O, and has no gotchas.
Is the idea that you are accepting CSV from an external source or system? in
which case I wonder if it's a good idea to silently mis-parse numbers that
aren't ints.
I was also just wondering out loud whether it's better to remove this example
rather than try to fix forward, as it's not an example of using Mahout per se.
> The SimpleCsvExamples didn't really parsed the double correctly with the
> FastLine and FastLineReader
> ----------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-677
> URL: https://issues.apache.org/jira/browse/MAHOUT-677
> Project: Mahout
> Issue Type: Bug
> Components: Examples
> Affects Versions: 0.5
> Reporter: Stanley Xu
> Priority: Minor
> Fix For: 0.5
>
> Attachments: simplecsvexamplebugfix.diff
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> The FastLineReader in SimpleCsvExamples.java try to parse the line quickly
> through parse the bytes directly from the stream without the cost of copy
> Strings. But it didn't parse the line correctly and will get all double
> values as zero in fast parsing mode
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira