[ 
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-1926:
------------------------------------------------

    Attachment: PIG-1926.8.patch

bq. The expression in limit should be allowed to refer to columns only in 
scalar context, ie pig should give a proper error message when such a statement 
is created. A Right now it allows the statement "lim = limit l $0;", and that 
results in NPE . You can create a validation Visitor to check this, that gets 
called from PigServer.Graph.compile(lp).

Addressed in PIG-1926.8.patch

Added LimitVariableValidator visitor to validate variable limit expression.
      -- If a ProjectExpression is found (intead of a ScalarExpression) it 
throws a FrontendException
      -- Modified PigServer to call the visitor
      -- Added a test for this case in TestLimitVariable


bq. If the expression in limit evaluates to bytearray, it can be implicitly 
cast to a long. This can be done in the typechecker.

Addressed in PIG-1926.8.patch

   -- Added implicit cast from bytearray to long in TypeCheckingRelVisitor
   -- Added test for it in TestLimitVariable


bq. In POLimit.getNext(Tuple t), while computing the value of the limit 
expression, the value of Result.returnStatus needs to be checked. If it is not 
STATUS_OK, it should give an error. The assert will give an error only if the 
assertions are explicitly enabled at runtime, so the assert should be replaced 
by a code that always throws an exception if the condition is not satisfied.

Addressed in PIG-1926.8.patch

  -- Added runtime check of expression result in POLimit

TODO:
Exception refinement. Right now I throw mostly RuntimeExceptions.

Discuss which modifications to MRCompiler are still needed.

In my opinion optimizations at this stage are not mandatory, we can postpone 
them for later. As stated in the project, the important thing is that 
optimizers continue working for the constant case and do not break for the 
variable case.

If you agree I would start working on SAMPLE.


> Sample/Limit should take scalar
> -------------------------------
>
>                 Key: PIG-1926
>                 URL: https://issues.apache.org/jira/browse/PIG-1926
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Daniel Dai
>            Assignee: Gianmarco De Francisci Morales
>              Labels: gsoc2011
>         Attachments: PIG-1926.7.patch, PIG-1926.8.patch, PIG-1926.patch, 
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use 
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to