[ 
https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Sharma updated CRUNCH-57:
-------------------------------

    Attachment: minver2.patch

@Gabriel, yes you are right that the approach outside MR context will be 
faster, but in MR we have  few things that come into play like eg when we are 
using a reducer after groupByKey then MR will put sorting in place, if we use 
it or not that's secondary but on reducer the output will be sorted always.

I have created a version for min function that tries to use things from MR and 
following the same principle. I tested it against the avro data in aggregate 
test. It is a bit faster that the current min function like the best result 
clocked 9% faster and in worst result it was the same. Another important aspect 
is it doesn't rely on user classes being comparable. 
                
> Add a length function to PCollection
> ------------------------------------
>
>                 Key: CRUNCH-57
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-57
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.3.0
>            Reporter: Kiyan Ahmadizadeh
>            Assignee: Josh Wills
>         Attachments: CRUNCH-57.patch, minver2.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a 
> PCollection.
>  
> For example, suppose there was an initial PCollection that was then filtered 
> into another.  If I'm interested in how many elements of the original 
> PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the 
> number of elements in the PCollection and returns the result. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to