[jira] [Commented] (CRUNCH-400) Materialized jobs should have stage in PipelineResult

Josh Wills (JIRA) Fri, 06 Jun 2014 16:40:15 -0700

    [ 
https://issues.apache.org/jira/browse/CRUNCH-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020539#comment-14020539
 ]


Josh Wills commented on CRUNCH-400:
-----------------------------------

No, I'm good with understanding the issue, just been busy with the 0.10.0 and 
0.8.3 releases. Will pick this up again next week. In the meantime, the 
workaround seems simple enough, modulo my comment above:

Iterable<String> preMaterialized = dataToBeMaterialized.materialize();
PipelineResult res = pipeline.run();
Set<String> materializedData = Sets.newHashSet(preMaterialized);

i.e., if you call materialize(), but don't call iterator() on the returned 
Iterable object, then call run(), and only _then_ read the data from the 
Iterable by calling iterator(), you will get the counter stats for the 
materialized object via the PipelineResult that is returned by pipeline.run().

That said, it seems reasonable that the underlying MaterializableIterable 
(which is the object that is returned by materialize() ) would hold on to the 
PipelineResult that is returned when it makes a call to run() and allow the 
client to access it, but even that solution will require some modification to 
your existing pipeline code.



> Materialized jobs should have stage in PipelineResult
> -----------------------------------------------------
>
>                 Key: CRUNCH-400
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-400
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Micah Whitacre
>
> Brought up as part of the proposed fix for CRUNCH-272 and on the mailing 
> list[1], a set of jobs kicked off due to a materialize() call will not be 
> tracked as part of the Pipeline's stage results returned by the 
> PipelineResult.
> [1] - 
> http://mail-archives.apache.org/mod_mbox/crunch-dev/201405.mbox/%3CCANFazTUAffvTctK5%3DWvW4KyBLSqLCNcke7ZMWwgASu%2BEtkDmyQ%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CRUNCH-400) Materialized jobs should have stage in PipelineResult

Reply via email to