[
https://issues.apache.org/jira/browse/CRUNCH-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451698#comment-13451698
]
Josh Wills commented on CRUNCH-59:
----------------------------------
The methods are designed to be called in different places during the flow.
configure() is called during the job construction process on the client side,
and provides a mechanism for a DoFn to alter the configuration of an MR job
before it is submitted.
initialize() is called on the DoFn when it is executed on a Hadoop at the start
of a map or reduce task.
I can see the naming being confusing, since it sounds like the DoFn is the
thing that is being configured, when in actuality it is the Configuration
object that is being modified by the DoFn, and am certainly open to a clearer
name.
> DoFn doesnt' need both configure and initialize methods
> -------------------------------------------------------
>
> Key: CRUNCH-59
> URL: https://issues.apache.org/jira/browse/CRUNCH-59
> Project: Crunch
> Issue Type: Bug
> Reporter: Vinod Kumar Vavilapalli
>
> DoFN doesn't seem to need both {{public void configure(Configuration conf)}}
> and {{public void initialize()}}. We can do with a single API like
> {{initialize(Configuration)}}.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira