Tushar Pradhan created PIG-2752:
-----------------------------------
Summary: Infinite (?) parser loop with complex FOREACH expression
Key: PIG-2752
URL: https://issues.apache.org/jira/browse/PIG-2752
Project: Pig
Issue Type: Bug
Components: parser
Affects Versions: 0.10.0
Environment: Linux
Reporter: Tushar Pradhan
The following Pig script seems to hang in the parser for Pig 0.10.0. It works
fine for Pig 0.8.1.
----
X = LOAD 'X' USING PigStorage(',') AS (
term: chararray,dcount: long,dcount_0: long,dcount_1: long,dcount_2:
long,dcount_4: long,dcount_5: long,dcount_6: long,dcount_7: long,dcount_8:
long,dcount_9: long,dcount_10: long,dcount_11: long,dcount_12: long,dcount_13:
long,dcount_U: long,dcount_L: long,dcount_C: long,dcount_M: long,dcount_P:
long,dcount_T: long,dcount_S: long,dcount_R: long,dcount_Z: long,dcount_K:
long);
Y =
FOREACH X
GENERATE
term,
(
(dcount_U > 0 OR dcount_C > 0 OR dcount_M > 0) AND (dcount_1 > 1 OR
dcount_1 == 1 AND dcount == 1) ? 1 : (
(dcount_U > 0 OR dcount_C > 0 OR dcount_M > 0) AND (dcount_2 > 1 OR
dcount_2 == 1 AND dcount == 1) ? 2 : (
(dcount_U > 0 OR dcount_C > 0 OR dcount_M > 0) AND (dcount_7 > 1 OR
dcount_7 == 1 AND dcount == 1) ? 7 : (
(dcount_U > 0 OR dcount_C > 0 OR dcount_M > 0) AND (dcount_9 > 1 OR
dcount_9 == 1 AND dcount == 1) ? 9 : (
(dcount_U > 0 OR dcount_C > 0 OR dcount_M > 0) AND (dcount_11 > 1
OR dcount_11 == 1 AND dcount == 1) ? 11 : (
dcount_5 > 1 OR dcount_5 == 1 AND dcount == 1 ? 5 : (
dcount_6 > 1 OR dcount_6 == 1 AND dcount == 1 ? 6 : (
dcount_8 > 1 OR dcount_8 == 1 AND dcount == 1 ? 8 : (
dcount_10 > 1 OR dcount_10 == 1 AND dcount == 1 ? 10 : (
dcount_12 > 1 OR dcount_12 == 1 AND dcount == 1 ? 12 : (
(dcount_U > 0 OR dcount_C > 0 OR dcount_M > 0) AND (dcount_13 > 0
OR dcount_13 == 1 AND dcount == 1) ? 13 : (
dcount_4 > 0 ? 4 : 0)))))))))))
) AS besttype;
STORE Y INTO 'Y';
----
2012-06-12 08:04:46,435 [main] INFO org.apache.pig.Main - Apache Pig version
0.10.0-SNAPSHOT (rexported) compiled May 08 2012, 08:26:29
2012-06-12 08:04:46,435 [main] INFO org.apache.pig.Main - Logging error
messages to: /tmp/pig_1339513486431.log
2012-06-12 08:04:46,950 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///
The hang occurs in both local and Hadoop modes
If I simplify the 'besttype' expression in the FOREACH a bit, the script works
fine.
The input 'X' directory isn't necessary as the processing gets stuck in the
parser, but if needed, can contain a sample 'part-r-00000' file with the line:
#1,49,1,0,0,0,0,0,0,0,0,0,0,0,48,0,0,0,0,49,1,2,0,0,43
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira