[
https://issues.apache.org/jira/browse/MAHOUT-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022096#comment-13022096
]
Sean Owen commented on MAHOUT-677:
----------------------------------
Yes that seems like a bug. I also wonder why the code is parsing doubles when
the generated input are ints?
The fast version won't fail if the input is malformed (values like "foo" or
"2.3"), and does make an assumption about character encoding, but I suppose
that's the point of this optimization. But if you're assuming the you know what
the input is like, exactly, can you assume a binary input format and avoid
parsing altogether?
Or... is this really just a proof of concept that could be just as well
removed? I don't see usages.
> The SimpleCsvExamples didn't really parsed the double correctly with the
> FastLine and FastLineReader
> ----------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-677
> URL: https://issues.apache.org/jira/browse/MAHOUT-677
> Project: Mahout
> Issue Type: Bug
> Components: Examples
> Affects Versions: 0.5
> Reporter: Stanley Xu
> Priority: Minor
> Fix For: 0.5
>
> Attachments: simplecsvexamplebugfix.diff
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> The FastLineReader in SimpleCsvExamples.java try to parse the line quickly
> through parse the bytes directly from the stream without the cost of copy
> Strings. But it didn't parse the line correctly and will get all double
> values as zero in fast parsing mode
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira