[ 
https://issues.apache.org/jira/browse/PIG-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105535#comment-14105535
 ] 

Cheolsoo Park commented on PIG-4135:
------------------------------------

Won't the input size reduce the use cases more than limit? For example, if 
fetch optimizer is disabled when input size > 100MB, the example that I 
described in this jira won't use fetch optimization even with limit. In fact, 
it will be disabled in most cases. On the other hand, limit is effective since 
it is pushed down into the load, no more records than the limit will be loaded.

I think the main use case of fetch optimization is to peek a few sample records 
from input files quickly. Requiring a limit doesn't reduce the use cases, does 
it?

> Fetch optimization should be disabled if plan contains no limit
> ---------------------------------------------------------------
>
>                 Key: PIG-4135
>                 URL: https://issues.apache.org/jira/browse/PIG-4135
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4135-1.patch
>
>
> After deploying fetch optimization in production, a couple of users ran into 
> this situation. They had fairly large input data, but after filtering it by a 
> regular expression, it becomes small. So they didn't add limit to the query. 
> The problem is that even though the output is small, processing the input 
> must be done in the cluster not in the client. However, fetch optimization 
> blindly fetches the entire input into the client since the plan is map-only 
> job and finishes with dump.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to