[jira] Commented: (PIG-1233) NullPointerException in AVG

2010-02-17 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835136#action_12835136
 ] 

Ankur commented on PIG-1233:


In the current code path we cannot have a situation where intermediateCount in 
NOT null but intermediateSum is null. So just checking the former if sufficient.

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.7.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #685

2010-02-17 Thread Apache Hudson Server
See 

Changes:

[olga] PIG-1226: suuport for additional jar files (thejas via olgan)

[rding] PIG-1194: ERROR 2055: Received Error while processing the map plan

--
[...truncated 240684 lines...]
[junit] 10/02/18 02:25:04 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=setPermission   
src=/tmp/hadoop-hudson/mapred/system/job_20100218022432445_0002/job.xml 
dst=nullperm=hudson:supergroup:rw-r--r--
[junit] 10/02/18 02:25:04 INFO datanode.DataNode: Deleting block 
blk_-8717085773027673757_1005 file 
build/test/data/dfs/data/data4/current/blk_-8717085773027673757
[junit] 10/02/18 02:25:04 INFO datanode.DataNode: Deleting block 
blk_-4113414690970244428_1006 file 
build/test/data/dfs/data/data3/current/blk_-4113414690970244428
[junit] 10/02/18 02:25:04 INFO hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_20100218022432445_0002/job.xml. 
blk_68013359739296494_1015
[junit] 10/02/18 02:25:04 INFO datanode.DataNode: Receiving block 
blk_68013359739296494_1015 src: /127.0.0.1:43539 dest: /127.0.0.1:55726
[junit] 10/02/18 02:25:04 INFO datanode.DataNode: Receiving block 
blk_68013359739296494_1015 src: /127.0.0.1:50495 dest: /127.0.0.1:56095
[junit] 10/02/18 02:25:04 INFO datanode.DataNode: Receiving block 
blk_68013359739296494_1015 src: /127.0.0.1:52105 dest: /127.0.0.1:40309
[junit] 10/02/18 02:25:04 INFO DataNode.clienttrace: src: /127.0.0.1:52105, 
dest: /127.0.0.1:40309, bytes: 48178, op: HDFS_WRITE, cliID: 
DFSClient_-1684024129, srvID: DS-1948219082-127.0.1.1-40309-1266459871415, 
blockid: blk_68013359739296494_1015
[junit] 10/02/18 02:25:04 INFO datanode.DataNode: PacketResponder 0 for 
block blk_68013359739296494_1015 terminating
[junit] 10/02/18 02:25:04 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40309 is added to 
blk_68013359739296494_1015 size 48178
[junit] 10/02/18 02:25:04 INFO DataNode.clienttrace: src: /127.0.0.1:50495, 
dest: /127.0.0.1:56095, bytes: 48178, op: HDFS_WRITE, cliID: 
DFSClient_-1684024129, srvID: DS-289342816-127.0.1.1-56095-1266459870897, 
blockid: blk_68013359739296494_1015
[junit] 10/02/18 02:25:04 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56095 is added to 
blk_68013359739296494_1015 size 48178
[junit] 10/02/18 02:25:04 INFO datanode.DataNode: PacketResponder 1 for 
block blk_68013359739296494_1015 terminating
[junit] 10/02/18 02:25:04 INFO DataNode.clienttrace: src: /127.0.0.1:43539, 
dest: /127.0.0.1:55726, bytes: 48178, op: HDFS_WRITE, cliID: 
DFSClient_-1684024129, srvID: DS-1346704804-127.0.1.1-55726-1266459871874, 
blockid: blk_68013359739296494_1015
[junit] 10/02/18 02:25:04 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:55726 is added to 
blk_68013359739296494_1015 size 48178
[junit] 10/02/18 02:25:04 INFO datanode.DataNode: PacketResponder 2 for 
block blk_68013359739296494_1015 terminating
[junit] 10/02/18 02:25:04 INFO hdfs.StateChange: DIR* 
NameSystem.completeFile: file 
/tmp/hadoop-hudson/mapred/system/job_20100218022432445_0002/job.xml is closed 
by DFSClient_-1684024129
[junit] 10/02/18 02:25:04 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=open
src=/tmp/hadoop-hudson/mapred/system/job_20100218022432445_0002/job.xml 
dst=nullperm=null
[junit] 10/02/18 02:25:04 INFO DataNode.clienttrace: src: /127.0.0.1:56095, 
dest: /127.0.0.1:50497, bytes: 48558, op: HDFS_READ, cliID: 
DFSClient_-1684024129, srvID: DS-289342816-127.0.1.1-56095-1266459870897, 
blockid: blk_68013359739296494_1015
[junit] 10/02/18 02:25:04 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=open
src=/tmp/hadoop-hudson/mapred/system/job_20100218022432445_0002/job.jar 
dst=nullperm=null
[junit] 10/02/18 02:25:04 INFO DataNode.clienttrace: src: /127.0.0.1:56095, 
dest: /127.0.0.1:50498, bytes: 2892707, op: HDFS_READ, cliID: 
DFSClient_-1684024129, srvID: DS-289342816-127.0.1.1-56095-1266459870897, 
blockid: blk_-7664397758800478618_1013
[junit] 10/02/18 02:25:04 INFO mapred.JobTracker: Initializing 
job_20100218022432445_0002
[junit] 10/02/18 02:25:04 INFO mapred.JobInProgress: Initializing 
job_20100218022432445_0002
[junit] 10/02/18 02:25:04 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=create  
src=/tmp/temp-1832912908/tmp-1615126932/_logs/history/localhost_1266459872468_job_20100218022432445_0002_hudson_Job2769198517474778469.jar
  dst=nullperm=hudson:supergroup:rw-r--r--
[junit] 10/02/18 02:25:04 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=create  
src=/tmp/temp-1832912908/tmp-1615126932/_logs/history/localhost_1266459872468_job_20100218022432445_0002_

[jira] Commented: (PIG-1238) Dump does not respect the schema

2010-02-17 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835086#action_12835086
 ] 

Daniel Dai commented on PIG-1238:
-

Do an explain, the last limit job is :

MapReduce node 1-99
Map Plan
Local Rearrange[tuple]{double}(false) - 1-103
|   |
|   Project[double][1] - 1-102
|
|---Limit - 1-101
|

|---Load(file:/tmp/temp-513510662/tmp1311900615:org.apache.pig.builtin.BinStorage)
 - 1-100
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-109
|
|---Limit - 1-108
|
|---New For Each(true)[bag] - 1-107
|   |
|   Project[tuple][1] - 1-106
|
|---Package[tuple]{double} - 1-105
Global sort: false

The project in the map plan is wrong.

> Dump does not respect the schema
> 
>
> Key: PIG-1238
> URL: https://issues.apache.org/jira/browse/PIG-1238
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> For complex data type and certain sequence of operations dump produces 
> results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1206) [zebra] throws an exception if a descending "order by" by pig tries to to create such a table

2010-02-17 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835051#action_12835051
 ] 

Yan Zhou commented on PIG-1206:
---

Patch committed to the load-store-redesign branch too.

> [zebra] throws an exception if a descending "order by" by pig tries to to 
> create such a table
> -
>
> Key: PIG-1206
> URL: https://issues.apache.org/jira/browse/PIG-1206
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: PIG-1206.patch
>
>
> As Zebra does not support descending sorted table, zebra will throw an 
> exception at backend when TFile sortness check fails. It has been determined 
> that a desirable behavoir is to store the data as unsorted after logging a 
> warning,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement

2010-02-17 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1169:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed.

> Top-N queries produce incorrect results when a store statement is added 
> between order by and limit statement
> 
>
> Key: PIG-1169
> URL: https://issues.apache.org/jira/browse/PIG-1169
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1169.patch
>
>
> ??We tried to get top N results after a groupby and sort, and got different 
> results with or without storing the full sorted results. Here is a skeleton 
> of our pig script.??
> {code}
> raw_data = Load '' AS (f1, f2, ..., fn);
> grouped = group raw_data by (f1, f2);
> data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value;
> ordered = order data by value DESC parallel 10;
> topn = limit ordered 10;
> store ordered into 'outputdir/full';
> store topn into 'outputdir/topn';
> {code}
> ??With the statement 'store ordered ...', top N results are incorrect, but 
> without the statement, results are correct. Has anyone seen this before? I 
> know a similar bug has been fixed in the multi-query release. We are on pig 
> .4 and hadoop .20.1.??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement

2010-02-17 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834966#action_12834966
 ] 

Daniel Dai commented on PIG-1169:
-

+1

> Top-N queries produce incorrect results when a store statement is added 
> between order by and limit statement
> 
>
> Key: PIG-1169
> URL: https://issues.apache.org/jira/browse/PIG-1169
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Richard Ding
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1169.patch
>
>
> ??We tried to get top N results after a groupby and sort, and got different 
> results with or without storing the full sorted results. Here is a skeleton 
> of our pig script.??
> {code}
> raw_data = Load '' AS (f1, f2, ..., fn);
> grouped = group raw_data by (f1, f2);
> data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value;
> ordered = order data by value DESC parallel 10;
> topn = limit ordered 10;
> store ordered into 'outputdir/full';
> store topn into 'outputdir/topn';
> {code}
> ??With the statement 'store ordered ...', top N results are incorrect, but 
> without the statement, results are correct. Has anyone seen this before? I 
> know a similar bug has been fixed in the multi-query release. We are on pig 
> .4 and hadoop .20.1.??

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1218) Use distributed cache to store samples

2010-02-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834960#action_12834960
 ] 

Ashutosh Chauhan commented on PIG-1218:
---

On trunk - patch
In POFRJoin#setUpHashMap()
{code}
POLoad ld = new POLoad(new OperatorKey("Repl File Loader", 1L),
replFile, false);
{code}
should it be?
{code}
 POLoad ld = new POLoad(new OperatorKey("Repl File Loader", 
NodeIdGenerator.getGenerator().getNextNodeId("Repl File Loader")),
replfile, false);
{code}

Also following can be moved out of for loop to avoid multiple connect() on pc.
{code}
 PigContext pc = new PigContext(ExecType.MAPREDUCE, props);  
pc.connect();
{code}

In jobControlCompiler#setupDistributedCacheForFRJoin()
{code}
new FRJoinDistributedCacheVisitor(mro.reducePlan, pigContext, conf)
.visit();
{code}
Do we need this? Isn't FR Join a map-side join. So, if POFRJoin ends up in 
mro.reducePlan thats a bug, no?


> Use distributed cache to store samples
> --
>
> Key: PIG-1218
> URL: https://issues.apache.org/jira/browse/PIG-1218
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1218.patch, PIG-1218_2.patch
>
>
> Currently, in the case of skew join and order by we use sample that is just 
> written to the dfs (not distributed cache) and, as the result, get opened and 
> copied around more than necessary. This impacts query performance and also 
> places unnecesary load on the name node

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1218) Use distributed cache to store samples

2010-02-17 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834957#action_12834957
 ] 

Pradeep Kamath commented on PIG-1218:
-

+1 Patch mostly looks good - couple of comments:
 * In a couple of places instead of using Configuration and JobConf based on 
PigMapReduce.sJobConf, you should create a new Configiuration(false) and new 
JobConf(false) so we create fresh datastructures without any properties coming 
from the Map reduce based datastructures.
 * Since partitionFile is no longer used in POPartitionRearrange.java we should 
remove it.

You can make these changes and go ahead and commit it if it passes tests

> Use distributed cache to store samples
> --
>
> Key: PIG-1218
> URL: https://issues.apache.org/jira/browse/PIG-1218
> Project: Pig
>  Issue Type: Improvement
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1218.patch, PIG-1218_2.patch
>
>
> Currently, in the case of skew join and order by we use sample that is just 
> written to the dfs (not distributed cache) and, as the result, get opened and 
> copied around more than necessary. This impacts query performance and also 
> places unnecesary load on the name node

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1241) Accumulator is turned on when a map is used with a non-accumulative UDF

2010-02-17 Thread Ying He (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834955#action_12834955
 ] 

Ying He commented on PIG-1241:
--

no, by default it is on.

boolean isAccum = 
"true".equalsIgnoreCase(pc.getProperties().getProperty("opt.accumulator","true"));

means if "opt.accumulator" is not present, the default value is "true"

> Accumulator is turned on when a map is used with a non-accumulative UDF
> ---
>
> Key: PIG-1241
> URL: https://issues.apache.org/jira/browse/PIG-1241
> Project: Pig
>  Issue Type: Bug
>Reporter: Ying He
> Attachments: accum.patch
>
>
> Exception is thrown for a script like the following:
> register /homes/yinghe/owl/string.jar;
> a = load 'a.txt' as (id, url);
> b = group  a by (id, url);
> c = foreach b generate  COUNT(a), (CHARARRAY) 
> string.URLPARSE(group.url)#'url';
> dump c;
> In this query, URLPARSE() is not accumulative, and it returns a map. 
> The accumulator optimizer failed to check UDF in this case, and tries to run 
> the job in accumulative mode. ClassCastException is thrown when trying to 
> cast UDF into Accumulator interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1241) Accumulator is turned on when a map is used with a non-accumulative UDF

2010-02-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834947#action_12834947
 ] 

Olga Natkovich commented on PIG-1241:
-

Ying, I thought that accumlate is on by default but it looks like you are 
setting it on only if the property is present. Is this true?

> Accumulator is turned on when a map is used with a non-accumulative UDF
> ---
>
> Key: PIG-1241
> URL: https://issues.apache.org/jira/browse/PIG-1241
> Project: Pig
>  Issue Type: Bug
>Reporter: Ying He
> Attachments: accum.patch
>
>
> Exception is thrown for a script like the following:
> register /homes/yinghe/owl/string.jar;
> a = load 'a.txt' as (id, url);
> b = group  a by (id, url);
> c = foreach b generate  COUNT(a), (CHARARRAY) 
> string.URLPARSE(group.url)#'url';
> dump c;
> In this query, URLPARSE() is not accumulative, and it returns a map. 
> The accumulator optimizer failed to check UDF in this case, and tries to run 
> the job in accumulative mode. ClassCastException is thrown when trying to 
> cast UDF into Accumulator interface.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1226) Need to be able to register jars on the command line

2010-02-17 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1226:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed to trunk. Thanks, Thejas

> Need to be able to register jars on the command line
> 
>
> Key: PIG-1226
> URL: https://issues.apache.org/jira/browse/PIG-1226
> Project: Pig
>  Issue Type: Bug
>Reporter: Alan Gates
>Assignee: Thejas M Nair
> Fix For: 0.7.0
>
> Attachments: PIG-1126.patch
>
>
> Currently 'register' can only be done inside a Pig Latin script.  Users often 
> run their scripts in different environments, so jar locations or versions may 
> change.  But they don't want to edit their script to fit each environment.  
> Instead they could register on the command line, something like:
> pig -Dpig.additional.jars=my.jar:your.jar script.pig
> These would not override registers in the Pig Latin script itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1233) NullPointerException in AVG

2010-02-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834937#action_12834937
 ] 

Olga Natkovich commented on PIG-1233:
-

the patch looks reasonable. One question I have is what would happen if 
intermediateSum is null. Seems like we should test for that as well.

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.7.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1079) Modify merge join to use distributed cache to maintain the index

2010-02-17 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1079:


Fix Version/s: 0.7.0
 Assignee: Richard Ding

> Modify merge join to use distributed cache to maintain the index
> 
>
> Key: PIG-1079
> URL: https://issues.apache.org/jira/browse/PIG-1079
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Richard Ding
> Fix For: 0.7.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-02-17 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1194:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

The second patch is committed.

> ERROR 2055: Received Error while processing the map plan
> 
>
> Key: PIG-1194
> URL: https://issues.apache.org/jira/browse/PIG-1194
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: inputdata.txt, PIG-1194.patch, PIG-1294_1.patch
>
>
> I have a simple Pig script which takes 3 columns out of which one is null. 
> {code}
> input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3);
> a = GROUP input BY (((double) col3)/((double) col2) > .001 OR col1 < 11 ? 
> col1 : -1);
> b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, 
> SUM(input.col3) as  col3;
> store b into 'finalresult';
> {code}
> When I run this script I get the following error:
> ERROR 2055: Received Error while processing the map plan.
> org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received 
> Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 
> A more useful error message for the purpose of debugging would be helpful.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1216) New load store design does not allow Pig to validate inputs and outputs up front

2010-02-17 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1216:


   Resolution: Fixed
Fix Version/s: 0.7.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Patch committed to load-store-redesign branch - Thanks Ashutosh!

Note that only outputs will be validated up front (in line with Pig 0.6.0) - 
inputs will not be validated up front since for the following case validating 
inputs is not easy:
{code}
...
store into 'foo'...
load 'foo'...
...
{code}

> New load store design does not allow Pig to validate inputs and outputs up 
> front
> 
>
> Key: PIG-1216
> URL: https://issues.apache.org/jira/browse/PIG-1216
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Alan Gates
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-1216.patch, pig-1216_1.patch
>
>
> In Pig 0.6 and before, Pig attempts to verify existence of inputs and 
> non-existence of outputs during parsing to avoid run time failures when 
> inputs don't exist or outputs can't be overwritten.  The downside to this was 
> that Pig assumed all inputs and outputs were HDFS files, which made 
> implementation harder for non-HDFS based load and store functions.  In the 
> load store redesign (PIG-966) this was delegated to InputFormats and 
> OutputFormats to avoid this problem and to make use of the checks already 
> being done in those implementations.  Unfortunately, for Pig Latin scripts 
> that run more then one MR job, this does not work well.  MR does not do 
> input/output verification on all the jobs at once.  It does them one at a 
> time.  So if a Pig Latin script results in 10 MR jobs and the file to store 
> to at the end already exists, the first 9 jobs will be run before the 10th 
> job discovers that the whole thing was doomed from the beginning.  
> To avoid this a validate call needs to be added to the new LoadFunc and 
> StoreFunc interfaces.  Pig needs to pass this method enough information that 
> the load function implementer can delegate to InputFormat.getSplits() and the 
> store function implementer to OutputFormat.checkOutputSpecs() if s/he decides 
> to.  Since 90% of all load and store functions use HDFS and PigStorage will 
> also need to, the Pig team should implement a default file existence check on 
> HDFS and make it available as a static method to other Load/Store function 
> implementers.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1194) ERROR 2055: Received Error while processing the map plan

2010-02-17 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834911#action_12834911
 ] 

Ashutosh Chauhan commented on PIG-1194:
---

+1 for the commit.

> ERROR 2055: Received Error while processing the map plan
> 
>
> Key: PIG-1194
> URL: https://issues.apache.org/jira/browse/PIG-1194
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.5.0, 0.6.0
>Reporter: Viraj Bhat
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: inputdata.txt, PIG-1194.patch, PIG-1294_1.patch
>
>
> I have a simple Pig script which takes 3 columns out of which one is null. 
> {code}
> input = load 'inputdata.txt' using PigStorage() as (col1, col2, col3);
> a = GROUP input BY (((double) col3)/((double) col2) > .001 OR col1 < 11 ? 
> col1 : -1);
> b = FOREACH a GENERATE group as col1, SUM(input.col2) as col2, 
> SUM(input.col3) as  col3;
> store b into 'finalresult';
> {code}
> When I run this script I get the following error:
> ERROR 2055: Received Error while processing the map plan.
> org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received 
> Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:277)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 
> A more useful error message for the purpose of debugging would be helpful.
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1233) NullPointerException in AVG

2010-02-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834702#action_12834702
 ] 

Hadoop QA commented on PIG-1233:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435854/jira-1233.patch
  against trunk revision 909921.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/208/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/208/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/208/console

This message is automatically generated.

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.7.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.