[ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005240#comment-13005240
 ] 

Alex Rovner commented on PIG-1891:
----------------------------------

Alan,

After a bit of investigation, even though what you have described can be 
achieved through the OutputCommitter it still seems that it would be much 
easier if the store func would have a "commit" method which is called once the 
job is final. This would significantly simplify writing a store func.

Currently if you take the OutputCommitter approach you would have to some how 
make the Commiter aware of what you want to do upon commit. This would mean 
that if you want to create SFTP store and DB store you would need to create 
your own StoreFunc, OutputFormat, RecordWriter and Outputcommiter. Seems a bit 
of an overkill for such a simple tasks?

Alex

> Enable StoreFunc to make intelligent decision based on job success or failure
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1891
>                 URL: https://issues.apache.org/jira/browse/PIG-1891
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Alex Rovner
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to