[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-08-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900099#action_12900099
 ] 

Thejas M Nair commented on PIG-1434:


bq. 1. Casts are optional and the datatype of scalar depends on the schema of C 
(ie depending on the schema of C, we add the casts implicitly. So, typically, 
count is a long and max is a double). In case of undeclared(null) schema for C, 
default type of scalar is chararray.

The default type everywhere else in pig is bytearray. I think we should follow 
that convention here as well . Any reason not to do that ? (The change can be 
part of a separate jira.)


 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-08-16 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12899042#action_12899042
 ] 

Aniket Mokashi commented on PIG-1434:
-

Comments on the finalized syntax--

With the above changes Pig now supports -
{code}
Y = foreach X generate $1/(long) C.count, $2-(long) C.max;
{code}
1. Casts are *optional* and the datatype of scalar depends on the schema of C 
(ie depending on the schema of C, we add the casts implicitly. So, typically, 
count is a long and max is a double). In case of undeclared(null) schema for C, 
default type of scalar is *chararray*.

2. Projections are mandatory. For example
{code}
Y = foreach X generate C; // is an *error*
{code}
We need to use-
{code}
Y = foreach X generate C.$0; 
{code}

3. Check if C is a scalar or not is not performed until runtime, thus it will 
fail at the time of execution of UDF with ExecException(Scalar has more than 
one row in the output).

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894837#action_12894837
 ] 

Hadoop QA commented on PIG-1434:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12451096/ScalarImplFinale1.patch
  against trunk revision 980930.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

-1 release audit.  The applied patch generated 409 release audit warnings 
(more than the trunk's current 403 warnings).

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/368/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/368/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/368/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/368/console

This message is automatically generated.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-08-02 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894657#action_12894657
 ] 

Richard Ding commented on PIG-1434:
---

It looks really good. Additional corner cases need to be considered:

* missing field in scalar file
* empty scalar file
* empty input directory

Some minor issues:

*  In PigServer. FileLocalizer.getTemporaryPath should be changed to use 
non-deprecated method.
*  In ScalarFinder, method isScalarPresent is not used.
*  In MRCompiler, variable scalarPhyFinder should be local so that 
ScalarPhyFinder can be simplified.

Also add a test case for using scalar in multi-query would be good.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893939#action_12893939
 ] 

Hadoop QA commented on PIG-1434:


-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12450872/ScalarImplFinale.patch
  against trunk revision 980276.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

-1 javac.  The applied patch generated 146 javac compiler warnings (more 
than the trunk's current 145 warnings).

-1 findbugs.  The patch appears to introduce 5 new Findbugs warnings.

-1 release audit.  The applied patch generated 406 release audit warnings 
(more than the trunk's current 400 warnings).

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/366/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/366/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/366/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/366/console

This message is automatically generated.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888449#action_12888449
 ] 

Alan Gates commented on PIG-1434:
-

Alright, I finally understand.  I think the potential confusion for the user 
and the Pig parser is caused by the proposed way to handle multi-columned 
input.  Rather than

{code}
Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max;
{code}

if we instead do
{code}
Y = foreach Z generate X::$1/((tuple)C).count, X::$2 - ((tuple)C).max;
{code}

then I believe it is clear for both user and parser what is happening.

In each case C is being cast to a tuple and then fields read out of it.  C is 
not being cast to a long.  Then the feature remains basically as originally 
proposed.  The relation being cast must have one record and one field.  That 
one field can be a tuple to handle the case where the record has multiple 
fields.  But Pig will still reads it as a single column which is a tuple, and 
the user will need to cast it accordingly.

This should also avoid accidental usage.  In the example above:
{code}
Y = foreach Z generate X::$1/C.count, X::$2 - C.max;
{code}
should still be an error because the type checker should not be able to find C 
as a tuple anywhere in its symbol table.


 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-14 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888546#action_12888546
 ] 

Dmitriy V. Ryaboy commented on PIG-1434:


+1 for casting as tuple.  Though it may have to look like

{code}
Y = foreach Z generate X::$1/(long) ((tuple)C).count, X::$2 - (long) 
((tuple)C).max;
{code}

Definitely -1 on the bracket syntax.. it seems very non-intuitive. 

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888051#action_12888051
 ] 

Alan Gates commented on PIG-1434:
-

bq. Adding this support makes pig code complicated/hacky, because we conclude 
any not parsed alias (AliasFieldOrSpec) as scalar and try to resolve it as 
scalar at runtime.

I don't understand when we'll see this issue.  Could you give me examples of 
errors users might make that we'll miss.  I definitely don't like the brackets.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-13 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888073#action_12888073
 ] 

Aniket Mokashi commented on PIG-1434:
-

I mean the cases where users type in some alias which are currently not 
possible to include inside a foreach statement and accidentally getting them 
treated as scalar by pig and then failing scripts at runtime (and may not fail 
in one-liner sample cases).
For example-
Y = foreach Z generate C.$0; where C is not a scalar. Currently, this would 
throw an error upfront, for erroneous usage (logical (do not know restrictions 
on foreach statement) or typing mistake) of C, But, after we add support of 
scalars, pig may conclude C to be used as a scalar and generate the plans 
accordingly.
By introducing square bracketed syntax we can make sure that user intended to 
use C as a scalar and it wasn't introduced by mistake.A cast would also work 
for this, but as we have introduced scalar projections (C.count, C.max etc), we 
already have cases wherein user may mean to cast fields(count, max) rather than 
scalars themselves.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886772#action_12886772
 ] 

Daniel Dai commented on PIG-1434:
-

We may also add some sanity check, instead of just doing a limit.
{code}
C = foreach C generate CheckSingular(*);
Z = join X by 1, C by 1 using 'replicated';
Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max;
{code}

CheckSingular will check if C only have one record.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886825#action_12886825
 ] 

Daniel Dai commented on PIG-1434:
-

We also need to enforce C only have one part file to do the check (use limit to 
achieve it).
{code}
C = limit C 2;
C = foreach C generate CheckSingular(*);
Z = join X by 1, C by 1 using 'replicated';
Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max;
{code}

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-09 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886914#action_12886914
 ] 

Aniket Mokashi commented on PIG-1434:
-

Adding this support makes pig code complicated/hacky, because we conclude any 
not parsed alias (AliasFieldOrSpec) as scalar and try to resolve it as scalar 
at runtime.

To simplify, square bracketed syntax is a better idea, for example- 
{code}
Y = foreach Z generate X::$1/(long) [C].count, X::$2-(long) [C].max;
{code}
Otherwise, such queries (if typed by mistakes) can result into non-intuitive 
errors for users.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-02 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884759#action_12884759
 ] 

Richard Ding commented on PIG-1434:
---

I agree that we should use the right syntax. What I meant was that it can be 
implemented as a 'replicated' cross which seems to solve the problems of 
implicit dependency and using distributed cache.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-02 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884791#action_12884791
 ] 

Thejas M Nair commented on PIG-1434:


I think the replicated cross is a good alternative to this feature, though this 
feature is probably more friendly for a beginner pig user. But if this feature 
makes the pig code very complicated/hacky (the dependency order and stuff), I 
think it might not be a bad idea to encourage the use of replicated-join 
instead .

As a side note, we can actually get 'replicated cross' working using replicated 
join -
eg -
{code}
 j = join l1 by 1, l2 by 1 using 'replicated';
{code}

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-02 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884794#action_12884794
 ] 

Richard Ding commented on PIG-1434:
---


So all one needs to do is internally replace the line: 

{code}
Y = foreach X generate $1/(long) C.count, $2-(long) C.max;
{code}

with

{code}
Z = join X by 1, C by 1 using 'replicated';
Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max;
{code}



 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-01 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884369#action_12884369
 ] 

Dmitriy V. Ryaboy commented on PIG-1434:


A couple of thoughts that came out of the Pig conributor meeting:

1) rather than scalar, we should make this work for single-tuple relations. 
That way a user can do something like this: 

{code}
A = load 'data' as (x, y, z);
B = group A all;
C = foreach B generate COUNT(A) as count, MAX(A.y) as max;
.
X = 
Y = foreach X generate $1/(long) C.count, $2-(long) C.max;
{code}

2) Writing the intermediate relation to a file can cause hotspots. We should 
push this into the distributed cache. In cases when the dist. cache is turned 
off, we can at least increase the replication factor to some large-ish number 
(10, maybe, like the jobs?)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-01 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884498#action_12884498
 ] 

Aniket Mokashi commented on PIG-1434:
-

I agree to Thejas that we should have a way for user to specify that he means 
to use C as scalar. This will avoid errors in pig code. 
Thus, we have, 
{code}
Y = foreach X generate $1/(int) [C].count, $2- [C],max, ([C].$1+2);
{code}
Do we fail if we find C to have more than one row, or do we just ignore it? 
Should we try to detect, that C has one row, in frontend?

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-01 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884501#action_12884501
 ] 

Thejas M Nair commented on PIG-1434:


bq. Do we fail if we find C to have more than one row, or do we just ignore it? 
I think we should fail in that case, rather than have a 
surprising/unpredictable behavior.  

bq. Should we try to detect, that C has one row, in frontend?
This will not be possible in all cases. Eg if input has only one row, or a 
filter is filtering out all except one row.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-01 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884502#action_12884502
 ] 

Dmitriy V. Ryaboy commented on PIG-1434:


SQL fails at runtime when executing queries that require a single row to be 
returned. So, oracle won't complain if you do this, for example:

{code}

SELECT foo.a, (SELECT c 
   FROM bar 
   WHERE foo.a = bar.a) 
from foo

{code}

unless the inner select produces more than one row. I think we should adopt the 
same approach -- assume the query is innocent until proven guilty.

-D

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-01 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884503#action_12884503
 ] 

Aniket Mokashi commented on PIG-1434:
-

bq Should we try to detect, that C has one row, in frontend?
We can try to detect the pattern that makes something as scalar (by marking B 
(group by all, limit 1) as scalar and then C as scalar etc) and fail upfront 
otherwise...

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-01 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884508#action_12884508
 ] 

Thejas M Nair commented on PIG-1434:


bq. We can try to detect the pattern that makes something as scalar (by marking 
B (group by all, limit 1) as scalar and then C as scalar etc) and fail upfront 
otherwise...
I don't think we should limit this feature to the query patterns where we can 
detect that in frontend. In some cases it will depend on the data.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-06-25 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882711#action_12882711
 ] 

Aniket Mokashi commented on PIG-1434:
-

The proposal for scalars is as follows -
{code}
A = load '1.txt' as (a1, a2);
B = group A all;
C = foreach B generate COUNT(A);
Y = foreach A generate C;
store Y into 'Ystore';
{code}
Based on the schema of C, we detect that Y means to use C as a scalar and 
internally track it as scalar. Thus, operations like C * C are also allowed. 
The limitation is that C should have long convertible value (when stored into 
the file). Also (int) C would be allowed and will succeed if the cast operation 
succeeds.

As mentioned by Daniel earlier, there are two challenges in introducing 
scalars--
1. Addition of implicit store- We cannot do it too early (parsing), as we get 
redundant (implicit) store operation for rest of the commands in the script. If 
we do it too late, merge algorithm doesn't find the store and discards the 
branch that compiles and executes the store.
To solve this, whenever we process a store plan after the parsing stage, we 
detect the existence of scalars into the plan and add required branches that 
has those scalars into the current plan. We also attach LOStores for the 
scalars and merge the required plan.
2. Tracking of implicit dependency- Existence of scalar C needs to be converted 
into a implicit ReadScalar operation, but other than this it also needs to add 
dependency on the map-reduce job that generates this scalar value. We track 
this dependency by adding LOScalar, POScalar operators that carry the reference 
to the scalar they depend upon. When we compile the map reduce plan, we replace 
POScalar with POUserFunc to load the scalar value and mark the dependency 
between two map reduce jobs.

I am attaching the patch with above mentioned changes.

Few known issues-
To track the dependencies of scalars, we need access to map of operators from 
one type of plan to other, but this map is generated by visitors. The same 
visitors are responsible for converting LOScalar -POScalar - POUserFunc. So, 
if a visitor visits LOScalar before LO associated with scalar ( C in example) 
we do not find PO associated with C. 

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-06-25 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882725#action_12882725
 ] 

Aniket Mokashi commented on PIG-1434:
-

Submitting to hudson to check for test failures

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-06-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881784#action_12881784
 ] 

Daniel Dai commented on PIG-1434:
-

We decide to change some implementation to solve the following problem:
1. To decide when to add store. Currently, we parse statement by statement, 
until we saw a store, we merge that branch into the integrated logical plan. If 
we add store too late, the merge algorithm cannot see the store and discard 
this branch. If we add store too early (during the parsing of Y, in the 
example), then later we do not store/dump Y, we get a redundant store for C
2. Implicit dependency between C - Y. C will create a side file and Y will use 
it. However, this is not the normal data flow and should not be represented as 
a connection in logical plan

Now we are exploring the following implementation:
1. Add LOScalar, POScalar to represent scalar expression
2. When parsing Y, we put LOScalar as a placeholder in the ForEach inner plan
3. When parsing store (Y), we know we need to merge the store branch. In the 
mean time, we check the branch (Y) if it contains a scalar, if so, find what 
the scalar refers to (C), add a store to that branch, and merge that branch to 
the integrated logical plan
4. Add a map reduce layer optimizer ScalarOptimizer. It check for map-reduce 
job contains POScalar, and map-reduce job POScalar contains the operator 
POScalar refers to, create a dependency between these two map-reduce jobs. 
ScalarOptimizer should run before MultiQueryOptimizer

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.