[ https://issues.apache.org/jira/browse/PIG-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433927#comment-13433927 ]
Rohini Palaniswamy commented on PIG-2870: ----------------------------------------- bq. In case of PIG-2821, what happens if you need to store to multiple HBase clusters? The 2 instances will clobber each other, won't they? Currently i don't think it is possible to store to multiple clusters because of the way hbase-site.xml is picked from classpath. Will need an option to specify the conf location as part of HBaseStorage for that I guess. But storing to multiple tables in hbase cluster should work as HBaseStorage keeps a local copy of Configuration and uses it along with UDFContext Properties. Anyways planning to rework PIG-2821 to store hbase properties in UDFContext instead of JobConf. Also found that the credentials are getting added to the JobConf when PigOutputFormat.checkOutputSpecs calls setStoreLocation when I was tring to figure out how HCatStorer was working with secure hcat even with PIG-2578. Ideally that is not how it should be working, but at least credentials are getting passed to the job someway. So I can getaway without reverting PIG-2578 for PIG-2821. But I am concerned about other StoreFunc implementations and Dmitriy's statement that "many StoreFunc implementations that rely on being able to mess with the JobConf" bq. I don't think it's as simple as reverting PIG-2578. Agree, we need to fix it correctly. We did some work on being able to use multiple output formats without one stepping on each other to write to multiple hcat tables at once by playing with the configuration and merging. http://svn.apache.org/viewvc/incubator/hcatalog/branches/branch-0.4/src/java/org/apache/hcatalog/mapreduce/MultiOutputFormat.java?revision=1351510&view=markup Something like that in pig would help. Simpler thing in pig's case would be to have a wrapper and serialize non-JT settings to one UDFContext property and merge JT specific configs like DistributedCache settings and set it in the job. And in backend, copy settings from UDFContext back into the job passed to setStoreLocation. bq. I am aware of many StoreFunc implementations that rely on being able to mess with the JobConf. This is an undocumented and backwards incompatible change. (Dmitriy's comment from PIG-2578) Agreeing to a design and fixing this correctly with backward compatibility might take some time. Was just thinking of getting PIG-2578 reverted till we get the fix done, so that atleast we have the old behaviour so that issues like PIG-2780 don't occur. It would be difficult to track down and go through all user written StoreFunc's and ensure none of them have done a set on JobConf. We have a release with PIG-2578 only out for 1.5 months and no issues have been reported so far. But the pig-0.10 adoption is only at 30-40% and many might not have downloaded the new release. What worries me is that there could be silent failures or wrong outputs depending upon the implementation of the StoreFunc. > pigServer.openIterator fails for jobs with no input splits > ---------------------------------------------------------- > > Key: PIG-2870 > URL: https://issues.apache.org/jira/browse/PIG-2870 > Project: Pig > Issue Type: Bug > Affects Versions: 0.11 > Reporter: Bill Graham > Assignee: Bill Graham > Attachments: PIG-2870.1.patch > > > Jobs that have valid input data, but 0 input splits (this is the case where > indexing implemented in the {{InputFormat}} might return 0 splits for an > aggressive filter) fail when {{pigServer.openIterator}} is called. This is > because {{mapred.output.dir}} isn't set, so the job succeeds without creating > the empty output directory. The {{ReadToEndLoader}} then fails due to the > null input directory. > It seems PIG-2578 introduced this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira