Mathias Herberts created PIG-2771:
-------------------------------------
Summary: Extend LoadPushDown to push limit info to the LoadFunc
Key: PIG-2771
URL: https://issues.apache.org/jira/browse/PIG-2771
Project: Pig
Issue Type: Improvement
Reporter: Mathias Herberts
Priority: Minor
It is not uncommon to use LIMIT clauses just after a LOAD, especially during
the development phase of new scripts.
The current behaviour is to do the LIMIT in the map phase just after the LOAD,
this means that the output of each Mapper has indeed N records if a 'LIMIT x N'
was used, but the LoadFunc has read all the records in its splits.
A nice optimization would be to push to the LoadFunc the fact that only the
first N records are needed, this way the LOAD would terminate as soon as each
Mapper have produced N records, which can speed up things quite a bit when
input is large.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira