[
https://issues.apache.org/jira/browse/CRUNCH-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453655#comment-13453655
]
Kiyan Ahmadizadeh commented on CRUNCH-58:
-----------------------------------------
I included these methods mostly because the backing PCollection exposes them.
All three seemed like useful things to expose to the client (although this is
debatable and could be convinced to remove some or all of them).
I didn't want to expose a getter for the backing PCollection for a couple of
reasons:
1. I wanted the PObject interface to be agnostic regarding what actually backed
the implementation. Including a method in the interface that returned the
backing PCollection would make this impossible. The importance of this in the
context of Crunch is debatable, since a PCollection is the mechanism through
which all distributed computation has to happen. PObjects act as a lazy Future
and later we might want to use that concept more generally.
2. It felt like it would hurt the PObject abstraction by exposing
implementation details. It provides a means for the client to initiate further
distributed computation on the data backing the PObject, which encourages bad
practice with PObjects. PObjects should be used for values small enough to fit
into memory so they can be worked with locally or shipped around with do
functions to act as side data for jobs. I think hiding the underlying
PCollection enforces this.
Thoughts?
> Implement PObject in Crunch/Scrunch
> -----------------------------------
>
> Key: CRUNCH-58
> URL: https://issues.apache.org/jira/browse/CRUNCH-58
> Project: Crunch
> Issue Type: New Feature
> Affects Versions: 0.3.0
> Reporter: Kiyan Ahmadizadeh
> Assignee: Kiyan Ahmadizadeh
> Attachments: CRUNCH-58.patch
>
>
> FlumeJava has the concept of a PObject<T>, a container for a singleton of
> type T. It is meant represent the result of a distributed computation that
> yields a singleton value (for example max, min, and length methods on
> PCollection<T>). Generally speaking, the result of any computation that
> combines/reduces a PCollection into a singleton value could be represented by
> a PObject.
> Like PCollection, a PObject defers distributed computation until its value is
> actually used.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira