[ 
https://issues.apache.org/jira/browse/MAHOUT-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022121#comment-13022121
 ] 

Sean Owen commented on MAHOUT-677:
----------------------------------

I can optimize the parsing about 10x more: don't parse from text! My question 
is whether the use cases this approach is for are simply better suited to 
binary input, since it is even faster, results in less I/O, and has no gotchas. 
Is the idea that you are accepting CSV from an external source or system? in 
which case I wonder if it's a good idea to silently mis-parse numbers that 
aren't ints.

I was also just wondering out loud whether it's better to remove this example 
rather than try to fix forward, as it's not an example of using Mahout per se.

> The SimpleCsvExamples didn't really parsed the double correctly with the 
> FastLine and FastLineReader
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-677
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-677
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5
>            Reporter: Stanley Xu
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: simplecsvexamplebugfix.diff
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The FastLineReader in SimpleCsvExamples.java try to parse the line quickly 
> through parse the bytes directly from the stream without the cost of copy 
> Strings. But it didn't parse the line correctly and will get all double 
> values as zero in fast parsing mode

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to