[ 
https://issues.apache.org/jira/browse/MAHOUT-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022166#comment-13022166
 ] 

Sean Owen commented on MAHOUT-677:
----------------------------------

(If it's just reading ints, I was thinking just write 4-byte ints. That's got 
to be the fastest of all.)

The author would be Ted, really his call. If it's more of an example for the 
book, it could be attached to the book. If it stays that's cool too, just need 
to have a think about fixing/documenting the issues raised here.

You've raised a different an interesting point about performance though. You 
find that the slow-down is actually in addToVector, where it converts a String 
to byte[]? The thing is, the corresponding line in the "fast" version skips 
this step and adds null.

Indeed, also passing null in the "normal" version makes it twice as fast for 
me. It's still twice as slow as the "fast" version though. But I do wonder 
whether the example deserves a bit more attention. I may not know what I'm 
doing. Is that a difference that shouldn't exist between the two benchmarks?

> The SimpleCsvExamples didn't really parsed the double correctly with the 
> FastLine and FastLineReader
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-677
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-677
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.5
>            Reporter: Stanley Xu
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: simplecsvexamplebugfix.diff
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The FastLineReader in SimpleCsvExamples.java try to parse the line quickly 
> through parse the bytes directly from the stream without the cost of copy 
> Strings. But it didn't parse the line correctly and will get all double 
> values as zero in fast parsing mode

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to