[ 
https://issues.apache.org/jira/browse/PIG-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433927#comment-13433927
 ] 

Rohini Palaniswamy commented on PIG-2870:
-----------------------------------------

bq. In case of PIG-2821, what happens if you need to store to multiple HBase 
clusters? The 2 instances will clobber each other, won't they?

Currently i don't think it is possible to store to multiple clusters because of 
the way hbase-site.xml is picked from classpath. Will need an option to specify 
the conf location as part of HBaseStorage for that I guess. But storing to 
multiple tables in hbase cluster should work as HBaseStorage keeps a local copy 
of Configuration and uses it along with UDFContext Properties. Anyways planning 
to rework PIG-2821 to store hbase properties in UDFContext instead of JobConf. 
Also found that the credentials are getting added to the JobConf when 
PigOutputFormat.checkOutputSpecs calls setStoreLocation when I was tring to 
figure out how HCatStorer was working with secure hcat even with PIG-2578. 
Ideally that is not how it should be working, but at least credentials are 
getting passed to the job someway. So I can getaway without reverting PIG-2578 
for PIG-2821. But I am concerned about other StoreFunc implementations and 
Dmitriy's statement that "many StoreFunc implementations that rely on being 
able to mess with the JobConf"

bq. I don't think it's as simple as reverting PIG-2578.
  Agree, we need to fix it correctly. We did some work on being able to use 
multiple output formats without one stepping on each other to write to multiple 
hcat tables at once by playing with the configuration and merging.  
http://svn.apache.org/viewvc/incubator/hcatalog/branches/branch-0.4/src/java/org/apache/hcatalog/mapreduce/MultiOutputFormat.java?revision=1351510&view=markup
  Something like that in pig would help. Simpler thing in pig's case would be 
to have a wrapper and serialize non-JT settings to one UDFContext property and 
merge JT specific configs like DistributedCache settings and set it in the job. 
And in backend, copy settings from UDFContext back into the job passed to 
setStoreLocation. 

bq. I am aware of many StoreFunc implementations that rely on being able to 
mess with the JobConf. This is an undocumented and backwards incompatible 
change. (Dmitriy's comment from PIG-2578)
   Agreeing to a design and fixing this correctly with backward compatibility 
might take some time. Was just thinking of getting PIG-2578 reverted till we 
get the fix done, so that atleast we have the old behaviour so that issues like 
PIG-2780 don't occur. It would be difficult to track down and go through all 
user written StoreFunc's and ensure none of them have done a set on JobConf. We 
have a release with PIG-2578 only out for 1.5 months and no issues have been 
reported so far. But the pig-0.10 adoption is only at 30-40% and many might not 
have downloaded the new release. What worries me is that there could be silent 
failures or wrong outputs depending upon the implementation of the StoreFunc.
                
> pigServer.openIterator fails for jobs with no input splits
> ----------------------------------------------------------
>
>                 Key: PIG-2870
>                 URL: https://issues.apache.org/jira/browse/PIG-2870
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: PIG-2870.1.patch
>
>
> Jobs that have valid input data, but 0 input splits (this is the case where 
> indexing implemented in the {{InputFormat}} might return 0 splits for an 
> aggressive filter) fail when {{pigServer.openIterator}} is called. This is 
> because {{mapred.output.dir}} isn't set, so the job succeeds without creating 
> the empty output directory. The {{ReadToEndLoader}} then fails due to the 
> null input directory.
> It seems PIG-2578 introduced this issue. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to