Github user jacques-n commented on the pull request:
https://github.com/apache/drill/pull/193#issuecomment-152047328
Interesting. Can you explain where the time is coming from? It isn't clear
to me why this will have a big impact over what we had before. While you're
pushing the limit down to just above the scan nodes, we already had an
optimization which avoided parallelization. Since we're pipelined this really
shouldn't matter much. Is limit zero not working right in the limit operator?
It should terminate upon receiving schema, not wait until a batch of actual
records (I'm wondering if it is doing the latter). Is sending zero records
through causing operators to skip compilation? In what cases was this change
taking something from hundreds of seconds to a few seconds? I'm asking these
questions so I can better understand as I want to make sure there isn't a bug
somewhere else. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---