[ https://issues.apache.org/jira/browse/PIG-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Woody Wang updated PIG-2592: ---------------------------- Description: Suppose we have a data file(test.txt) whose content is: 1,2,3 2,3,4 3,4,5 4,5,6 I want to select the records whose the 1st field is '3'. The Pig script is: t = LOAD 'test.txt' USING PigStorage(','); t1 = FOREACH t GENERATE $0 AS i0:chararray, $1 AS i1:chararray, $2 AS i2:chararray; f1 = FILTER t1 BY i0 == '3'; DUMP f1 The task runs well but the output result is nothing. EXPLAIN f1 shows: {noformat} #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-27 Map Plan f1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26 | |---f1: Filter[bag] - scope-22 | | | Equal To[boolean] - scope-25 | | | |---Project[chararray][0] - scope-23 | | | |---Constant(3) - scope-24 | |---t1: New For Each(false,false,false)[bag] - scope-21 | | | Project[bytearray][0] - scope-15 | | | Project[bytearray][1] - scope-17 | | | Project[bytearray][2] - scope-19 | |---t: Load(file:///Users/woody/test.txt:PigStorage(',')) - scope-14-------- Global sort: false ---------------- {noformat} However, if I change the head 2 lines into: t1 = LOAD 'test.txt' USING PigStorage(',') AS (i0:chararray, i1:chararray, i2:chararray) (i.e. assign the schema in LOAD statement) The task works well and the result is also correct. In this case, the EXPLAIN f1 shows: {noformat} #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-33 Map Plan f1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-32 | |---f1: Filter[bag] - scope-28 | | | Equal To[boolean] - scope-31 | | | |---Project[chararray][0] - scope-29 | | | |---Constant(3) - scope-30 | |---t1: New For Each(false,false,false)[bag] - scope-27 | | | Cast[chararray] - scope-19 | | | |---Project[bytearray][0] - scope-18 | | | Cast[chararray] - scope-22 | | | |---Project[bytearray][1] - scope-21 | | | Cast[chararray] - scope-25 | | | |---Project[bytearray][2] - scope-24 | |---t1: Load(file:///Users/woody/test.txt:PigStorage(',')) - scope-17-------- Global sort: false {noformat} was: Suppose we have a data file(test.txt) whose content is: 1,2,3 2,3,4 3,4,5 4,5,6 I want to select the records whose the 1st field is '3'. The Pig script is: t = LOAD 'test.txt' USING PigStorage(','); t1 = FOREACH t GENERATE $0 AS i0:chararray, $1 AS i1:chararray, $2 AS i2:chararray; f1 = FILTER t1 BY i0 == '3'; DUMP f1 The task runs well but the output result is nothing. EXPLAIN f1 shows: #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-27 Map Plan f1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26 | |---f1: Filter[bag] - scope-22 | | | Equal To[boolean] - scope-25 | | | |---Project[chararray][0] - scope-23 | | | |---Constant(3) - scope-24 | |---t1: New For Each(false,false,false)[bag] - scope-21 | | | Project[bytearray][0] - scope-15 | | | Project[bytearray][1] - scope-17 | | | Project[bytearray][2] - scope-19 | |---t: Load(file:///Users/woody/test.txt:PigStorage(',')) - scope-14-------- Global sort: false ---------------- However, if I change the head 2 lines into: t1 = LOAD 'test.txt' USING PigStorage(',') AS (i0:chararray, i1:chararray, i2:chararray) (i.e. assign the schema in LOAD statement) The task works well and the result is also correct. In this case, the EXPLAIN f1 shows: #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-33 Map Plan f1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-32 | |---f1: Filter[bag] - scope-28 | | | Equal To[boolean] - scope-31 | | | |---Project[chararray][0] - scope-29 | | | |---Constant(3) - scope-30 | |---t1: New For Each(false,false,false)[bag] - scope-27 | | | Cast[chararray] - scope-19 | | | |---Project[bytearray][0] - scope-18 | | | Cast[chararray] - scope-22 | | | |---Project[bytearray][1] - scope-21 | | | Cast[chararray] - scope-25 | | | |---Project[bytearray][2] - scope-24 | |---t1: Load(file:///Users/woody/test.txt:PigStorage(',')) - scope-17-------- Global sort: false > Using FILTER after FOREACH in Pig-Latin failed > ---------------------------------------------- > > Key: PIG-2592 > URL: https://issues.apache.org/jira/browse/PIG-2592 > Project: Pig > Issue Type: Bug > Components: impl, parser > Affects Versions: 0.9.2 > Environment: MacOS X 10.6.7 > java version "1.6.0_29" > Java(TM) SE Runtime Environment (build 1.6.0_29-b11-402-11M3527) > Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02-402, mixed mode) > Reporter: Woody Wang > > Suppose we have a data file(test.txt) whose content is: > 1,2,3 > 2,3,4 > 3,4,5 > 4,5,6 > I want to select the records whose the 1st field is '3'. The Pig script is: > t = LOAD 'test.txt' USING PigStorage(','); > t1 = FOREACH t GENERATE $0 AS i0:chararray, $1 AS i1:chararray, $2 AS > i2:chararray; > f1 = FILTER t1 BY i0 == '3'; > DUMP f1 > The task runs well but the output result is nothing. EXPLAIN f1 shows: > {noformat} > #-------------------------------------------------- > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node scope-27 > Map Plan > f1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26 > | > |---f1: Filter[bag] - scope-22 > | | > | Equal To[boolean] - scope-25 > | | > | |---Project[chararray][0] - scope-23 > | | > | |---Constant(3) - scope-24 > | > |---t1: New For Each(false,false,false)[bag] - scope-21 > | | > | Project[bytearray][0] - scope-15 > | | > | Project[bytearray][1] - scope-17 > | | > | Project[bytearray][2] - scope-19 > | > |---t: Load(file:///Users/woody/test.txt:PigStorage(',')) - > scope-14-------- > Global sort: false > ---------------- > {noformat} > However, if I change the head 2 lines into: > t1 = LOAD 'test.txt' USING PigStorage(',') AS (i0:chararray, i1:chararray, > i2:chararray) > (i.e. assign the schema in LOAD statement) > The task works well and the result is also correct. In this case, the EXPLAIN > f1 shows: > {noformat} > #-------------------------------------------------- > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node scope-33 > Map Plan > f1: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-32 > | > |---f1: Filter[bag] - scope-28 > | | > | Equal To[boolean] - scope-31 > | | > | |---Project[chararray][0] - scope-29 > | | > | |---Constant(3) - scope-30 > | > |---t1: New For Each(false,false,false)[bag] - scope-27 > | | > | Cast[chararray] - scope-19 > | | > | |---Project[bytearray][0] - scope-18 > | | > | Cast[chararray] - scope-22 > | | > | |---Project[bytearray][1] - scope-21 > | | > | Cast[chararray] - scope-25 > | | > | |---Project[bytearray][2] - scope-24 > | > |---t1: Load(file:///Users/woody/test.txt:PigStorage(',')) - > scope-17-------- > Global sort: false > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira