[ https://issues.apache.org/jira/browse/PIG-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105186#comment-14105186 ]
Lorand Bendig commented on PIG-4135: ------------------------------------ [~cheolsoo], thanks for pointing out this issue. Filter is a good safeguard, but it also reduces the use cases of fetch. I'm wondering, whether we can have an input size estimation instead, like pig.auto.local.input.maxbytes ? > Fetch optimization should be disabled if plan contains no limit > --------------------------------------------------------------- > > Key: PIG-4135 > URL: https://issues.apache.org/jira/browse/PIG-4135 > Project: Pig > Issue Type: Bug > Reporter: Cheolsoo Park > Assignee: Cheolsoo Park > Fix For: 0.14.0 > > Attachments: PIG-4135-1.patch > > > After deploying fetch optimization in production, a couple of users ran into > this situation. They had fairly large input data, but after filtering it by a > regular expression, it becomes small. So they didn't add limit to the query. > The problem is that even though the output is small, processing the input > must be done in the cluster not in the client. However, fetch optimization > blindly fetches the entire input into the client since the plan is map-only > job and finishes with dump. -- This message was sent by Atlassian JIRA (v6.2#6252)