Mathias Herberts created PIG-2771:
-------------------------------------

             Summary: Extend LoadPushDown to push limit info to the LoadFunc
                 Key: PIG-2771
                 URL: https://issues.apache.org/jira/browse/PIG-2771
             Project: Pig
          Issue Type: Improvement
            Reporter: Mathias Herberts
            Priority: Minor


It is not uncommon to use LIMIT clauses just after a LOAD, especially during 
the development phase of new scripts.

The current behaviour is to do the LIMIT in the map phase just after the LOAD, 
this means that the output of each Mapper has indeed N records if a 'LIMIT x N' 
was used, but the LoadFunc has read all the records in its splits.

A nice optimization would be to push to the LoadFunc the fact that only the 
first N records are needed, this way the LOAD would terminate as soon as each 
Mapper have produced N records, which can speed up things quite a bit when 
input is large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to