[
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gianmarco De Francisci Morales updated PIG-1926:
------------------------------------------------
Attachment: PIG-1926.8.patch
bq. The expression in limit should be allowed to refer to columns only in
scalar context, ie pig should give a proper error message when such a statement
is created. A Right now it allows the statement "lim = limit l $0;", and that
results in NPE . You can create a validation Visitor to check this, that gets
called from PigServer.Graph.compile(lp).
Addressed in PIG-1926.8.patch
Added LimitVariableValidator visitor to validate variable limit expression.
-- If a ProjectExpression is found (intead of a ScalarExpression) it
throws a FrontendException
-- Modified PigServer to call the visitor
-- Added a test for this case in TestLimitVariable
bq. If the expression in limit evaluates to bytearray, it can be implicitly
cast to a long. This can be done in the typechecker.
Addressed in PIG-1926.8.patch
-- Added implicit cast from bytearray to long in TypeCheckingRelVisitor
-- Added test for it in TestLimitVariable
bq. In POLimit.getNext(Tuple t), while computing the value of the limit
expression, the value of Result.returnStatus needs to be checked. If it is not
STATUS_OK, it should give an error. The assert will give an error only if the
assertions are explicitly enabled at runtime, so the assert should be replaced
by a code that always throws an exception if the condition is not satisfied.
Addressed in PIG-1926.8.patch
-- Added runtime check of expression result in POLimit
TODO:
Exception refinement. Right now I throw mostly RuntimeExceptions.
Discuss which modifications to MRCompiler are still needed.
In my opinion optimizations at this stage are not mandatory, we can postpone
them for later. As stated in the project, the important thing is that
optimizers continue working for the constant case and do not break for the
variable case.
If you agree I would start working on SAMPLE.
> Sample/Limit should take scalar
> -------------------------------
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
> Issue Type: Improvement
> Reporter: Daniel Dai
> Assignee: Gianmarco De Francisci Morales
> Labels: gsoc2011
> Attachments: PIG-1926.7.patch, PIG-1926.8.patch, PIG-1926.patch,
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information
> about the program can be found at http://wiki.apache.org/pig/GSoc2011
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira