[ 
https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012735#comment-13012735
 ] 

Dmitriy Lyubimov commented on MAHOUT-633:
-----------------------------------------

bq. In the tests I ran, not significant. But i was using preprocessing handler 
on VectorWritable (the patch you guys were reluctant to accept) which did not 
create intermediate vector storage at all. All matrix elements were passed in 
on stack. Mahout's patch doesn't have this code but i will be happy to put it 
in jira for discussion. Also, i did not run close to memory limits on the tasks 
i ran with SSVD. I just don't have datasets that big. 

Actually correction: i do think i saw difference in running time between code 
running with existing VectorWriable that creates interim vector instances for 
each iteration and the code that just passes matrix elements on stack. But like 
i said i did not measure the difference it as it was not my goal. And it's not 
the evidence i'd like to appeal anyway since there wasn't much side info. But i 
have other situations (outside Mahout realm but also iterative batches with big 
side info) that i'd like to appeal to. 

> Add SequenceFileIterable; put Iterable stuff in one place
> ---------------------------------------------------------
>
>                 Key: MAHOUT-633
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-633
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering, Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: iterable, iterator, sequence-file
>             Fix For: 0.5
>
>         Attachments: MAHOUT-633.patch, MAHOUT-633.patch, MAHOUT-633.patch
>
>
> In another project I have a useful little class, SequenceFileIterable, which 
> simplifies iterating over a sequence file. It's like FileLineIterable. I'd 
> like to add it, then use it throughout the code. See patch, which for now 
> merely has the proposed new classes. 
> Well it also moves some other iterator-related classes that seemed to be 
> outside their rightful home in common.iterator.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to