[
https://issues.apache.org/jira/browse/CRUNCH-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020539#comment-14020539
]
Josh Wills commented on CRUNCH-400:
-----------------------------------
No, I'm good with understanding the issue, just been busy with the 0.10.0 and
0.8.3 releases. Will pick this up again next week. In the meantime, the
workaround seems simple enough, modulo my comment above:
Iterable<String> preMaterialized = dataToBeMaterialized.materialize();
PipelineResult res = pipeline.run();
Set<String> materializedData = Sets.newHashSet(preMaterialized);
i.e., if you call materialize(), but don't call iterator() on the returned
Iterable object, then call run(), and only _then_ read the data from the
Iterable by calling iterator(), you will get the counter stats for the
materialized object via the PipelineResult that is returned by pipeline.run().
That said, it seems reasonable that the underlying MaterializableIterable
(which is the object that is returned by materialize() ) would hold on to the
PipelineResult that is returned when it makes a call to run() and allow the
client to access it, but even that solution will require some modification to
your existing pipeline code.
> Materialized jobs should have stage in PipelineResult
> -----------------------------------------------------
>
> Key: CRUNCH-400
> URL: https://issues.apache.org/jira/browse/CRUNCH-400
> Project: Crunch
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.9.0, 0.8.2
> Reporter: Micah Whitacre
>
> Brought up as part of the proposed fix for CRUNCH-272 and on the mailing
> list[1], a set of jobs kicked off due to a materialize() call will not be
> tracked as part of the Pipeline's stage results returned by the
> PipelineResult.
> [1] -
> http://mail-archives.apache.org/mod_mbox/crunch-dev/201405.mbox/%3CCANFazTUAffvTctK5%3DWvW4KyBLSqLCNcke7ZMWwgASu%2BEtkDmyQ%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.2#6252)