Matt, 
Its always better to have a join for the corresponding fork. I think it
would be better if you clarify in the question more about your workflow
design and the requirement for asynchronous spikes.

Thanks,
Virag


On 7/17/12 2:30 PM, "Matt Goeke" <[email protected]> wrote:

> Virag,
> 
> Thanks for the response. I have read the workflow spec and while I realize
> there is the ability to fork within a workflow my issue is that all forks
> must be paired with joins. What I was looking for was some way to fork but
> not require all of the forked nodes to rejoin the primary workflow (hence
> some of the nodes becoming asynchronous spikes). I feel like this
> capability might already exist and this might just be an issue of
> workflow/subworkflow composition.
> 
> --
> Matt Goeke
> 
> On Tue, Jul 17, 2012 at 2:00 PM, Virag Kothari <[email protected]> wrote:
> 
>> Hi Matt,
>> I think you can fork the hive actions using the fork/join control nodes in
>> Oozie.
>> 
>> http://incubator.apache.org/oozie/docs/3.2.0-incubating/docs/WorkflowFunctio
>> nalSpec.html#a3.1.5_Fork_and_Join_Control_Nodes.
>> 
>> I have no idea why the attachment doesn't work.
>> 
>> Thanks,
>> Virag
>> 
>> 
>> On 7/17/12 12:13 PM, "Matt Goeke" <[email protected]> wrote:
>> 
>>> Apparently when I put an imagur link in the reply the spam score gets
>> high
>>> enough that the delivery is denied... is there anyway to link an image?
>>> Also, if not then is there anything I can clarify in the question that
>>> would make it more straightforward?
>>> 
>>> --
>>> Matt Goeke
>>> 
>>> On Tue, Jul 17, 2012 at 11:22 AM, Mona Chitnis <[email protected]
>>> wrote:
>>> 
>>>> The attachment hasn't come through. This had happened with an earlier
>>>> email with the Oozie Meetup slides attachments too. Any solutions?
>>>> 
>>>> --
>>>> Mona Chitnis
>>>> 
>>>> From: Matt Goeke <[email protected]<mailto:
>> [email protected]>>
>>>> Reply-To: "[email protected]<mailto:
>>>> [email protected]>" <[email protected]
>>>> <mailto:[email protected]>>
>>>> To: "[email protected]<mailto:
>>>> [email protected]>" <[email protected]
>>>> <mailto:[email protected]>>
>>>> Subject: Oozie: asynchronous forking
>>>> 
>>>> All,
>>>> 
>>>> Does anyone know if it is possible to do asynchronous forking in Oozie?
>>>> Currently we are running a set of ETL extractions that are pairs of
>> actions
>>>> (sqoop action then a hive transformation) but we would like to have the
>>>> Sqoop actions be serial and the Hive actions be called asynchronously
>> when
>>>> the paired Sqoop job finishes. The reason the Sqoop actions are serial
>> is
>>>> we would like to limit the number of concurrent mappers hitting the data
>>>> source and we could do this through the fair scheduler but that would
>>>> require a pool per data source. Attached is a picture of suggested ETL
>> flow.
>>>> 
>>>> If anyone has any suggestions on best practices around this I would love
>>>> to hear them.
>>>> 
>>>> Thanks,
>>>> Matt
>>>> 
>> 
>> 

Reply via email to