[ 
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-1926:
------------------------------------------------

    Attachment: PIG-1926.7.patch

bq. The expression in limit should be allowed to refer to columns only in 
scalar context, ie pig should give a proper error message when such a statement 
is created. A Right now it allows the statement "lim = limit l $0;", and that 
results in NPE . You can create a validation Visitor to check this, that gets 
called from PigServer.Graph.compile(lp).

Thanks for pointing this out. I hadn't thought about this.
I suppose I should check that there is a scalar expression in the expression 
plan.
But this would disallow using UDFs, wouldn't it?
Is there another way to check the context of an expression?

bq. If the expression in limit evaluates to bytearray, it can be implicitly 
cast to a long. This can be done in the typechecker.

Isn't this dangerous? We would arbitrarily cast the first 8 bytes of the array 
to long.
What if the byte array is larger/smaller than this?
What would be the use case for such a feature?

bq. The test files are missing. Maybe you forgot to include test dir in diff?

Yes I forgot to add the file to svn. I re.added it in PIG-1926.7.patch.

bq. The following visitors also should visit the expression in limit - 
DotLOPrinter,LogicalPlanPrinter,SchemaResetter,UidResetter.

Addedd in PIG-1926.7.patch

bq. MRCompiler changes look good, some additional changes would be required 
(for sort with limit) there once optimizations in LimitOptimizer are enabled.

Sort with limit is used only in MRCompiler, right?

> Sample/Limit should take scalar
> -------------------------------
>
>                 Key: PIG-1926
>                 URL: https://issues.apache.org/jira/browse/PIG-1926
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Gianmarco De Francisci Morales
>              Labels: gsoc2011
>         Attachments: PIG-1926.7.patch, PIG-1926.patch, PIG-1926.patch, 
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use 
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to