[jira] Updated: (PIG-824) SQL interface for Pig

2009-08-14 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-824:
--

Attachment: SQL_IN_PIG.html
PIG-824.1.patch
PIG-824.binfiles.tar.gz

PIG-824.binfiles.tar.gz - contains libs that it depends on
PIG-824.1.patch - patch
SQL_IN_PIG.html - (brief) document

JFlex.jar has not been included because it covered by GPL. It will have to be 
downloaded to lib dir for building with the patch. In future Ivy will be setup 
to download it .

> SQL interface for Pig
> -
>
> Key: PIG-824
> URL: https://issues.apache.org/jira/browse/PIG-824
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
> Attachments: PIG-824.1.patch, PIG-824.binfiles.tar.gz, SQL_IN_PIG.html
>
>
> In the last 18 month PigLatin has gained significant popularity within the 
> open source community. Many users like its data flow model, its rich type 
> system and its ability to work with any data available on HDFS or outside. We 
> have also heard from many users that having Pig speak SQL would bring many 
> more users. Having a single system that exports multiple interfaces is a big 
> advantage as it guarantees consistent semantics, custom code reuse, and 
> reduces the amount of maintenance. This is especially relevant for project 
> where using both interfaces for different parts of the system is relevant.  
> For instance, in a 
> data warehousing system, you would have ETL component that brings data  into 
> the warehouse and a component that analyzes the data and produces reports. 
> PigLatin is uniquely suited for ETL processing while SQL might be a better 
> fit for report generation.
> To start, it would make sense to implement a subset of SQL92 standard and to 
> be as much as possible standard compliant. This would include all the 
> standard constructs: select, from, where, group-by + having, order by, limit, 
> join (inner + outer). Several extensions  such as support for pig's UDFs and 
> possibly streaming, multiquery and support for pig's complex types would be 
> helpful.
> This work is dependent on metadata support outlined in 
> https://issues.apache.org/jira/browse/PIG-823

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-925) Fix join in local mode

2009-08-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-925:
--

Assignee: Daniel Dai

> Fix join in local mode
> --
>
> Key: PIG-925
> URL: https://issues.apache.org/jira/browse/PIG-925
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
>
> Join is broken after LOJoin patch (Optimizer_Phase5.patch of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest 
> join script is not working under local mode:
> eg:
> a = load '1.txt';
> b = load '2.txt';
> c = join a by $0, b by $0;
> dump c;
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
> at 
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
> at 
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
> at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-925) Fix join in local mode

2009-08-14 Thread Daniel Dai (JIRA)
Fix join in local mode
--

 Key: PIG-925
 URL: https://issues.apache.org/jira/browse/PIG-925
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
 Fix For: 0.4.0


Join is broken after LOJoin patch (Optimizer_Phase5.patch of 
[PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest join 
script is not working under local mode:

eg:
a = load '1.txt';
b = load '2.txt';
c = join a by $0, b by $0;
dump c;

Caused by: java.lang.NullPointerException
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
at 
org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
at 
org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
at 
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743498#action_12743498
 ] 

Hadoop QA commented on PIG-922:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416618/PIG-922-p1_1.patch
  against trunk revision 804310.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/165/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/165/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/165/console

This message is automatically generated.

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-Patch-minerva.apache.org #165

2009-08-14 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/165/




[jira] Commented: (PIG-914) Change the PIG hbase interface to use bytes along with strings

2009-08-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743493#action_12743493
 ] 

Daniel Dai commented on PIG-914:


Hi, Alex,
Are you able to assign the issue to yourself through Jira? Same to Pig-915, 
Pig-916.

> Change the PIG hbase interface to use bytes along with strings
> --
>
> Key: PIG-914
> URL: https://issues.apache.org/jira/browse/PIG-914
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alex Newman
>Priority: Minor
>
> Currently start rows, tablenames, column names are all strings, and HBase 
> supports bytes we might want to change the Pig interface to support bytes 
> along with strings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Status: Patch Available  (was: Open)

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Attachment: PIG-922-p1_1.patch

Address comments by Hudson.

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Status: Open  (was: Patch Available)

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar

2009-08-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743467#action_12743467
 ] 

Daniel Dai commented on PIG-892:


+1

> Make COUNT and AVG deal with nulls accordingly with SQL standar
> ---
>
> Key: PIG-892
> URL: https://issues.apache.org/jira/browse/PIG-892
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.4.0
>
> Attachments: PIG-892.patch, PIG-892_v2.patch, PIG-892_v3.patch
>
>
> both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match 
> COUNT(*) in SQL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743456#action_12743456
 ] 

Hadoop QA commented on PIG-922:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416587/PIG-922-p1_0.patch
  against trunk revision 804310.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

-1 release audit.  The applied patch generated 164 release audit warnings 
(more than the trunk's current 163 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/console

This message is automatically generated.

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #164

2009-08-14 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/164/changes

Changes:

[pradeepkth] PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth) - 
deleting renamed file - MRStreamHandler.java

[pradeepkth] PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth)

[daijy] PIG-913: Error in Pig script when grouping on chararray column

--
[...truncated 111633 lines...]
 [exec] [junit] 
 [exec] [junit] 09/08/14 21:35:32 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/14 21:35:32 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908142134_0002/job.jar. 
blk_-6402472781047644060_1012
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block 
blk_-6402472781047644060_1012 src: /127.0.0.1:44681 dest: /127.0.0.1:40670
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block 
blk_-6402472781047644060_1012 src: /127.0.0.1:39714 dest: /127.0.0.1:53345
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block 
blk_-6402472781047644060_1012 src: /127.0.0.1:41010 dest: /127.0.0.1:41033
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block 
blk_-6402472781047644060_1012 of size 1497453 from /127.0.0.1
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 0 
for block blk_-6402472781047644060_1012 terminating
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block 
blk_-6402472781047644060_1012 of size 1497453 from /127.0.0.1
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41033 is added to 
blk_-6402472781047644060_1012 size 1497453
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 1 
for block blk_-6402472781047644060_1012 terminating
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:53345 is added to 
blk_-6402472781047644060_1012 size 1497453
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block 
blk_-6402472781047644060_1012 of size 1497453 from /127.0.0.1
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40670 is added to 
blk_-6402472781047644060_1012 size 1497453
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 2 
for block blk_-6402472781047644060_1012 terminating
 [exec] [junit] 09/08/14 21:35:33 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908142134_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/14 21:35:33 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908142134_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908142134_0002/job.split. 
blk_5455499385688750307_1013
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block 
blk_5455499385688750307_1013 src: /127.0.0.1:39716 dest: /127.0.0.1:53345
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block 
blk_5455499385688750307_1013 src: /127.0.0.1:44685 dest: /127.0.0.1:40670
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Receiving block 
blk_5455499385688750307_1013 src: /127.0.0.1:39881 dest: /127.0.0.1:59910
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block 
blk_5455499385688750307_1013 of size 1837 from /127.0.0.1
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 0 
for block blk_5455499385688750307_1013 terminating
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:59910 is added to 
blk_5455499385688750307_1013 size 1837
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block 
blk_5455499385688750307_1013 of size 1837 from /127.0.0.1
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40670 is added to 
blk_5455499385688750307_1013 size 1837
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: PacketResponder 1 
for block blk_5455499385688750307_1013 terminating
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.DataNode: Received block 
blk_5455499385688750307_1013 of size 1837 from /127.0.0.1
 [exec] [junit] 09/08/14 21:35:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:53345 is added to 
blk_54554993856887503

Food for thought on Pig design

2009-08-14 Thread Alan Gates

http://dreamsongs.com/WIB.html mainly section 2.1 on Worse is Better

I stumbled across this article today and found the section on Worse is  
Better very interesting, especially since he is directly comparing the  
design philosophies of C vs Lisp.  The article is almost 20 years old,  
so you may have seen it before.


Alan.


[jira] Updated: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-14 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-924:
--

Attachment: pig_924.patch

The attached patch includes dynamic shims that could be used with Pig if it 
didn't bundle its hadoop classes.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-14 Thread Dmitriy V. Ryaboy (JIRA)
Make Pig work with multiple versions of Hadoop
--

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy


The current Pig build scripts package hadoop and other dependencies into the 
pig.jar file.
This means that if users upgrade Hadoop, they also need to upgrade Pig.

Pig has relatively few dependencies on Hadoop interfaces that changed between 
18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to use 
the correct calls for any of the above versions of Hadoop. Unfortunately, the 
building process precludes us from the ability to do this at runtime, and 
forces an unnecessary Pig rebuild even if dynamic shims are created.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-923) Allow setting logfile location in pig.properties

2009-08-14 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-923:
--

Attachment: pig_923.patch

One-line change to allow Main.java to default to the value specified in 
pig.logfile.
-l still overrides.
Not specifying pig.logfile in pig.properties results in the same behavior as 
before.

No unit tests; checked manually.

> Allow setting logfile location in pig.properties
> 
>
> Key: PIG-923
> URL: https://issues.apache.org/jira/browse/PIG-923
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Dmitriy V. Ryaboy
> Fix For: 0.4.0
>
> Attachments: pig_923.patch
>
>
> Local log file location can be specified through the -l flag, but it cannot 
> be set in pig.properties.
> This JIRA proposes a change to Main.java that allows it to read the 
> "pig.logfile" property from the configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-923) Allow setting logfile location in pig.properties

2009-08-14 Thread Dmitriy V. Ryaboy (JIRA)
Allow setting logfile location in pig.properties


 Key: PIG-923
 URL: https://issues.apache.org/jira/browse/PIG-923
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Dmitriy V. Ryaboy
 Fix For: 0.4.0


Local log file location can be specified through the -l flag, but it cannot be 
set in pig.properties.

This JIRA proposes a change to Main.java that allows it to read the 
"pig.logfile" property from the configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Status: Patch Available  (was: Open)

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Attachment: PIG-922-p1_0.patch

Attach patch for phase 1.

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743348#action_12743348
 ] 

Daniel Dai commented on PIG-922:


Design for push up projection rule:

Presumption:
* Prune columns of loader, save time for record parsing

   a = load 'a' as (n1:chararray, n2:chararray, n3:chararray);
   b = foreach a generate n1, n2;
 => a = load 'a' (n1:chararray, n2:chararray)
 
We do not need to parse n3 in our loader.

* Prune columns across map-reduce boundary (between map-reduce jobs or 
inter map-reduce jobs), save bandwidth

a = load 'a' as (n1:chararray, n2:chararray, n3:chararray);
b = group a by n1;
c = sort b by n2;
d = foreach c generate n2, n3;

 => a = load 'a' as (n1:chararray, n2:chararray, n3:chararray);
b = group a by n1;
b1 = foreach b generate n2, n3;
c = sort b1 by n2;
   d = foreach c generate n2, n3;
 
* Prune column within map-reduce boundary does not seem to be helpful

store a into 'a';
b = filter a by n1='1';
c = foreach b generate n2;
dump c;

 => store a into 'a';
   a1 = foreach a generate n1, n2;
b = filter a1 by n1='1';
c = foreach b generate n2;
   dump c;

In this case, an extra foreach step is processed, but we gain no benefit.

Algorithm description:
1. Divide all logical operators into two categories: create map-reduce 
boundary and not create map-reduce boundary.

boundary = true: LOCoGroup, LOCross, LOJoin, LODistinct, LOSort
boundary = false: LOFilter, LOForEach, LODefine, LOLoad, 
LOStore, LOSplit, LOSplitOutput, LOStream, LOUnion
  LOJoin can be boundary or not, depends on the type of join
 

2. We collect required fields from the bottom, a reverse dependency 
order walker algorithm is required to do this

3. We do not actually start from the leaf. We start from the last 
LOForEach. Only LOForEach prune columns. If there is no LOForEach in 
the script, then we cannot prune anything.

4. From a required output, we need an algorithm to figure required input

<= require $0, $2, $3
b = foreach a generate $0, $2+$3;
<= require $0, $1
 

5. From the bottom LOForEach, we collect required fields all the way 
up, if we move over a boundary operator, save the position because it is 
possible to put projection there

..
 => projection here
x = CoGroup .
..
 => projection here
y = order ..
 

Put the projection right before boundary to make sure fewer data cross 
the boundary

However, we do not make this decision and do the actual prune now, we 
will do the actual pruning top down

6. While we traversing up, if we see operator containing more than one 
inputs, we trace required fields in all directions; We rely on the 
output schema of this operator to figure out which required fields 
belong to which input. If we see operator containing more than one 
outputs, we collects required fields until all outputs has been traced

7. If we see LOStream, LOStore, we stop

8. If we see LOLoad, we stop and set required fields in LOLoad

9. From LOLoad, we do a top down traverse to decide whether we need to 
put projection, and if yes, insert ForEach

10. We only add projection if it is necessary. It is only necessary when 
the required fields of that boundary operator is more than output fields 
of operator before it.

Filter .. (output fields: n1, n2, n3)
<= we can prune n3 here
x = CoGroup  (required fields: n1, n2)
 

11. It is possible that we create a foreach which can be combined into 
previous foreach, however, we do not handle it in PushUpProject rule

ForEach..
  <= we will add a ForEach anyway here
x = CoGroup .
 

12. Everytime we insert a LOForEach, we need to adjust the projection 
map all the way down

13. To fit the PushUpProject into current optimizor framework, we hook 
the check rule to LOForEach. Everytime we start from LOForEach and we 
never push up over another LOForEach. So we stop at LOForEach and save 
required fields upto this point.

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by

[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743347#action_12743347
 ] 

Daniel Dai commented on PIG-922:


There will be three patches for this issue:
phase 1: Infrastructure to find relevant input columns from output column
phase 2: Infrastructure to prune column for each relational operator
phase 3: push up project optimization rule

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743336#action_12743336
 ] 

Hudson commented on PIG-913:


Integrated in Pig-trunk #522 (See 
[http://hudson.zones.apache.org/hudson/job/Pig-trunk/522/])
: Error in Pig script when grouping on chararray column


> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913-2.patch, PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-845) PERFORMANCE: Merge Join

2009-08-14 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-845:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Ashutosh for this significant contribution!

> PERFORMANCE: Merge Join
> ---
>
> Key: PIG-845
> URL: https://issues.apache.org/jira/browse/PIG-845
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Attachments: merge-join.patch
>
>
> Thsi join would work if the data for both tables is sorted on the join key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-845) PERFORMANCE: Merge Join

2009-08-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-845:
-

Attachment: (was: merge-join.patch)

> PERFORMANCE: Merge Join
> ---
>
> Key: PIG-845
> URL: https://issues.apache.org/jira/browse/PIG-845
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Ashutosh Chauhan
> Attachments: merge-join.patch
>
>
> Thsi join would work if the data for both tables is sorted on the join key.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-913:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed.

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913-2.patch, PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-922:
--

Assignee: Daniel Dai

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-922) Logical optimizer: push up project

2009-08-14 Thread Daniel Dai (JIRA)
Logical optimizer: push up project
--

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
 Fix For: 0.4.0


This is a continuation work of 
[PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another 
rule to the logical optimizer: Push up project, ie, prune columns as early as 
possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743264#action_12743264
 ] 

Daniel Dai commented on PIG-913:


This release audit warning is caused by a new golden file. We cannot add 
release audit notes to golden files.

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913-2.patch, PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.