[
https://issues.apache.org/jira/browse/PIG-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045044#comment-13045044
]
Thejas M Nair commented on PIG-1926:
------------------------------------
{quote}
I suppose I should check that there is a scalar expression in the expression
plan.
But this would disallow using UDFs, wouldn't it?
Is there another way to check the context of an expression?
{quote}
The columns will map to a ProjectExpression. If the visitor finds a
ProjectExpression, it can throw an error.
{quote}
Isn't this dangerous? We would arbitrarily cast the first 8 bytes of the array
to long.
What if the byte array is larger/smaller than this?
What would be the use case for such a feature?
{quote}
In pig the types are optional, and the default type is bytearray. bytearray is
cast to other types based on the context in Typechecker. For example, see -
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#Arithmetic+Operators
If the bytearray cannot be cast to long, you will get an error in
Result.returnStatus .
for example -
{code}
a = load 'x' as (id, num);
fil = filter a by id == '123'; -- assuming this returns one row
b = load 'y';
lim = limit b fil.num ; -- fil.num should be cast to long
{code}
In POLimit.getNext(Tuple t), while computing the value of the limit expression,
the value of Result.returnStatus needs to be checked. If it is not STATUS_OK,
it should give an error. The assert will give an error only if the assertions
are explicitly enabled at runtime, so the assert should be replaced by a code
that always throws an exception if the condition is not satisfied.
bq. Sort with limit is used only in MRCompiler, right?
Are you asking if the changes for use of limit expression in sort are required
only in MRCompiler ? Yes, thats the only place where change should be needed,
apart from LimitOptimizer.
> Sample/Limit should take scalar
> -------------------------------
>
> Key: PIG-1926
> URL: https://issues.apache.org/jira/browse/PIG-1926
> Project: Pig
> Issue Type: Improvement
> Reporter: Daniel Dai
> Assignee: Gianmarco De Francisci Morales
> Labels: gsoc2011
> Attachments: PIG-1926.7.patch, PIG-1926.patch, PIG-1926.patch,
> PIG-1926.patch, PIG-1926.patch, PIG-1926.patch, PIG-1926.patch
>
>
> Currently, Limit, Sample only takes a constant. It would be better we can use
> a scalar in the place of constant. Eg:
> {code}
> a = load 'a.txt';
> b = group a all;
> c = foreach b generate COUNT(a) as sum;
> d = order a by $0;
> e = limit d c.sum/100;
> {code}
> This is a candidate project for Google summer of code 2011. More information
> about the program can be found at http://wiki.apache.org/pig/GSoc2011
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira