[
https://issues.apache.org/jira/browse/PIG-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433927#comment-13433927
]
Rohini Palaniswamy commented on PIG-2870:
-----------------------------------------
bq. In case of PIG-2821, what happens if you need to store to multiple HBase
clusters? The 2 instances will clobber each other, won't they?
Currently i don't think it is possible to store to multiple clusters because of
the way hbase-site.xml is picked from classpath. Will need an option to specify
the conf location as part of HBaseStorage for that I guess. But storing to
multiple tables in hbase cluster should work as HBaseStorage keeps a local copy
of Configuration and uses it along with UDFContext Properties. Anyways planning
to rework PIG-2821 to store hbase properties in UDFContext instead of JobConf.
Also found that the credentials are getting added to the JobConf when
PigOutputFormat.checkOutputSpecs calls setStoreLocation when I was tring to
figure out how HCatStorer was working with secure hcat even with PIG-2578.
Ideally that is not how it should be working, but at least credentials are
getting passed to the job someway. So I can getaway without reverting PIG-2578
for PIG-2821. But I am concerned about other StoreFunc implementations and
Dmitriy's statement that "many StoreFunc implementations that rely on being
able to mess with the JobConf"
bq. I don't think it's as simple as reverting PIG-2578.
Agree, we need to fix it correctly. We did some work on being able to use
multiple output formats without one stepping on each other to write to multiple
hcat tables at once by playing with the configuration and merging.
http://svn.apache.org/viewvc/incubator/hcatalog/branches/branch-0.4/src/java/org/apache/hcatalog/mapreduce/MultiOutputFormat.java?revision=1351510&view=markup
Something like that in pig would help. Simpler thing in pig's case would be
to have a wrapper and serialize non-JT settings to one UDFContext property and
merge JT specific configs like DistributedCache settings and set it in the job.
And in backend, copy settings from UDFContext back into the job passed to
setStoreLocation.
bq. I am aware of many StoreFunc implementations that rely on being able to
mess with the JobConf. This is an undocumented and backwards incompatible
change. (Dmitriy's comment from PIG-2578)
Agreeing to a design and fixing this correctly with backward compatibility
might take some time. Was just thinking of getting PIG-2578 reverted till we
get the fix done, so that atleast we have the old behaviour so that issues like
PIG-2780 don't occur. It would be difficult to track down and go through all
user written StoreFunc's and ensure none of them have done a set on JobConf. We
have a release with PIG-2578 only out for 1.5 months and no issues have been
reported so far. But the pig-0.10 adoption is only at 30-40% and many might not
have downloaded the new release. What worries me is that there could be silent
failures or wrong outputs depending upon the implementation of the StoreFunc.
> pigServer.openIterator fails for jobs with no input splits
> ----------------------------------------------------------
>
> Key: PIG-2870
> URL: https://issues.apache.org/jira/browse/PIG-2870
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11
> Reporter: Bill Graham
> Assignee: Bill Graham
> Attachments: PIG-2870.1.patch
>
>
> Jobs that have valid input data, but 0 input splits (this is the case where
> indexing implemented in the {{InputFormat}} might return 0 splits for an
> aggressive filter) fail when {{pigServer.openIterator}} is called. This is
> because {{mapred.output.dir}} isn't set, so the job succeeds without creating
> the empty output directory. The {{ReadToEndLoader}} then fails due to the
> null input directory.
> It seems PIG-2578 introduced this issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira