[
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gianmarco De Francisci Morales updated PIG-1926:
------------------------------------------------
Attachment: PIG-1926.7.patch
bq. The expression in limit should be allowed to refer to columns only in
scalar context, ie pig should give a proper error message when such a statement
is created. A Right now it allows the statement "lim = limit l $0;", and that
results in NPE . You can create a validation Visitor to check this, that gets
called from PigServer.Graph.compile(lp).
Thanks for pointing this out. I hadn't thought about this.
I suppose I should check that there is a scalar expression in the expression
plan.
But this would disallow using UDFs, wouldn't it?
Is there another way to check the context of an expression?
bq. If the expression in limit evaluates to bytearray, it can be implicitly
cast to a long. This can be done in the typechecker.
Isn't this dangerous? We would arbitrarily cast the first 8 bytes of the array
to long.
What if the byte array is larger/smaller than this?
What would be the use case for such a feature?
bq. The test files are missing. Maybe you forgot to include test dir in diff?
Yes I forgot to add the file to svn. I re.added it in PIG-1926.7.patch.
bq. The following visitors also should visit the expression in limit -
DotLOPrinter,LogicalPlanPrinter,SchemaResetter,UidResetter.
Addedd in PIG-1926.7.patch
bq. MRCompiler changes look good, some additional changes would be required
(for sort with limit) there once optimizations in LimitOptimizer are enabled.
Sort with limit is used only in MRCompiler, right?
> Sample/Limit should take scalar
> -------------------------------
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
> Issue Type: Improvement
> Reporter: Daniel Dai
> Assignee: Gianmarco De Francisci Morales
> Labels: gsoc2011
> Attachments: PIG-1926.7.patch, PIG-1926.patch, PIG-1926.patch,
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information
> about the program can be found at http://wiki.apache.org/pig/GSoc2011
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira