Micah Whitacre created CRUNCH-479:
-------------------------------------
Summary: Writing to target with WriteMode.APPEND merges values
into PCollection
Key: CRUNCH-479
URL: https://issues.apache.org/jira/browse/CRUNCH-479
Project: Crunch
Issue Type: Bug
Components: Core
Reporter: Micah Whitacre
Assignee: Josh Wills
This was mentioned as part of CDK-617[1]. A PCollection that contains a set of
values, is written to a target with WriteMode.APPEND, and then that PCollection
is materialized, when you iterate over that PCollection it contains not only
the new values that were appended but also the existing values. This is
surprising as most would expect that collection to only contain the original
collection of values. A use case for this might be if the solution is looking
to only process the new values instead of dealing with all of the existing data.
[1] - https://issues.cloudera.org/browse/CDK-671
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)