[ 
https://issues.apache.org/jira/browse/PIG-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1421:
-----------------------------

    Attachment: PIG-1421.patch

Fix includes:

1. Make setLocation() light weight and make sure no name node access. Note that 
setLocation() was a new API on LoadFunc introduced in 0.7. UDFContext is used 
for some cases.
2. Remove code for setting properties (INPUT_FE and INPUT_DELETED_CGS) in 
TableInputFormat because it's ineffective.
3. Move the logic in #2 to TableInputFormat.setInputPaths() and make sure that 
it's only done once (Because setInputPaths() are called multiple times in PIG 
code path).
4. Remove unnecessary list status calls in  Zebra IO layer.
5. Remove the code that makes name node calls for sorted table in Pig code path.
6. Make sure that clob check is only done on the front end.

> [Zebra] Pig script with Zebra data storage brings down name node due to 
> excessive name node call.
> -------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1421
>                 URL: https://issues.apache.org/jira/browse/PIG-1421
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 0.7.0
>
>         Attachments: PIG-1421.patch
>
>
> Because Pig call setLocation() on LoadFunc API on both frontent and backend, 
> and Zebra makes name node access in its implementation, name node becomes 
> irresponsive because of the number of name node calls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to