[ 
https://issues.apache.org/jira/browse/PIG-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419516#comment-13419516
 ] 

Jie Li commented on PIG-2824:
-----------------------------

Here is the script I used:

{code}
LineItems = LOAD '$input/lineitem' USING PigStorage('|') AS (orderkey, partkey, 
suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, 
linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment);

Result = filter LineItems by 1==0; 

STORE Result INTO '$output/filter';
{code}

Note again we specified -t PushUpFilter to force processing Foreach before the 
filter, so we can observe the overhead of Foreach. With this patch, Foreach 
will not be inserted and we can achieve the improvement shown in 2824.png, 
which is about 234 seconds vs. 147 seconds for loading 10GB data.
                
> Pushing checking number of fields into LoadFunc
> -----------------------------------------------
>
>                 Key: PIG-2824
>                 URL: https://issues.apache.org/jira/browse/PIG-2824
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.9.0, 0.10.0
>            Reporter: Jie Li
>         Attachments: 2824.patch, 2824.png
>
>
> As described in PIG-1188, if users define a schema (w or w/o types), we need 
> to check the number of fields after loading data, so if there are less fields 
> we need to pad null fields, and if there are more fields we need to throw 
> them away. 
> For schema with types, Pig used to insert a Foreach after the loader for type 
> casting which also checks #fields. For schema without types there was no such 
> Foreach, thus PIG-1188 inserted one just for checking #fields. Unfortunately, 
> Foreach is too expensive for such checking, and ideally we can push it into 
> the loader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to