[jira] [Commented] (CRUNCH-57) Add a length function to PCollection

Gabriel Reid (JIRA) Fri, 14 Sep 2012 01:05:15 -0700

    [ 
https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455657#comment-13455657
 ]


Gabriel Reid commented on CRUNCH-57:
------------------------------------

@Rahul, I don't really think that this is a situation that we need to overly 
optimize for right now, and I definitely wouldn't say that we're losing what 
Hadoop can do for us out of the box. I think that CPU usage will only be 
slightly lowered by making use of WritableComparable, as the main objective 
here is to minimize the amount of data being sent to the reducer. Additionally, 
the current implementation makes use of a Combiner, so it's likely that it'll 
be even more successful in minimizing the size of the shuffle.

In any case, I think we're going pretty far outside the scope of this 
particular JIRA issue, so it's probably best to continue this discussion on 
crunch-dev or a different JIRA issue.


                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch, CRUNCH-57.patch, MinMaxFn.patch, 
> minver2.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a 
> PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered 
> into another.  If I'm interested in how many elements of the original 
> PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the 
> number of elements in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-57) Add a length function to PCollection

Reply via email to