[
https://issues.apache.org/jira/browse/CRUNCH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449999#comment-13449999
]
Kiyan Ahmadizadeh commented on CRUNCH-57:
-----------------------------------------
I'm up for taking on an implementation of PObject and incorporating it into
this change. I've created a ticket CRUNCH-58 for this. Josh, please check
that ticket for some discussion on the implementation of PObject.
+1 For using decorators to achieve the Fluent pattern without crowding the
methods in the PCollection interface. This would work well in Java and Scala.
I think Gabriel's geometry example highlights the issue that you may want
special operations on PCollections holding objects of a specific type. Another
example would be PCollections of numeric data. It would make sense for such
collections to have special operations like average, sum, etc.
-1 On not including length() in the base PCollection interface, however. I
think decorators are great for the case outlined above, where the functionality
applies only to PCollections holding objects of a specific type. Counting the
number of elements in a PCollection, however, is applicable to all PCollection
regardless of the type of object it contains. I think operations that can
apply to any and all PCollections belong in the PCollection interface, and
operations applicable to a specific kind of PCollection belong in decorators.
For this reason I argue that length() goes in the PCollection interface.
> Add a length function to PCollection
> ------------------------------------
>
> Key: CRUNCH-57
> URL: https://issues.apache.org/jira/browse/CRUNCH-57
> Project: Crunch
> Issue Type: New Feature
> Components: Core
> Affects Versions: 0.3.0
> Reporter: Kiyan Ahmadizadeh
> Assignee: Josh Wills
> Attachments: CRUNCH-57.patch
>
>
> Sometimes it's useful and interesting to compute the number of elements in a
> PCollection.
>
> For example, suppose there was an initial PCollection that was then filtered
> into another. If I'm interested in how many elements of the original
> PCollection matched the filter, I'll have to write extra code to compute this.
> PCollections should have a length method that, when called, computes the
> number of elements in the PCollection and returns the result.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira