[jira] [Updated] (PIG-3288) Kill jobs if the number of output files is over a configurable limit
[ https://issues.apache.org/jira/browse/PIG-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3288: --- Status: Open (was: Patch Available) [~aniket486], your suggestion makes a lot of sense, and I like it. Let me think about this more. Canceling the patch for now. > Kill jobs if the number of output files is over a configurable limit > > > Key: PIG-3288 > URL: https://issues.apache.org/jira/browse/PIG-3288 > Project: Pig > Issue Type: Wish >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3288-2.patch, PIG-3288-3.patch, PIG-3288-4.patch, > PIG-3288-5.patch, PIG-3288.patch > > > I ran into a situation where a Pig job tried to create too many files on hdfs > and overloaded NN. To prevent such events, it would be nice if we could set a > upper limit on the number of files that a Pig job can create. > In fact, Hive has a property called "hive.exec.max.created.files". The idea > is that each mapper/reducer increases a counter every time when they create > files. Then, MRLauncher periodically checks whether the number of created > files so far has exceeded the upper limit. If so, we kill running jobs and > exit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 12290: CASE and IN fail when expression includes dereferencing operator
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/12290/ --- (Updated July 17, 2013, 5:49 a.m.) Review request for pig. Changes --- Updated AliasMasker.g. Bugs: PIG-3374 https://issues.apache.org/jira/browse/PIG-3374 Repository: pig-git Description --- See PIG-3374 for details. Diffs (updated) - src/org/apache/pig/parser/AliasMasker.g 98d94f7 src/org/apache/pig/parser/AstPrinter.g d87 src/org/apache/pig/parser/AstValidator.g d0ed0e8 src/org/apache/pig/parser/LogicalPlanGenerator.g cc1f47e src/org/apache/pig/parser/QueryParser.g d4d9700 test/org/apache/pig/test/TestCase.java 5d8f7f3 test/org/apache/pig/test/TestIn.java c3a55de Diff: https://reviews.apache.org/r/12290/diff/ Testing --- Added new test cases to TestIn and TestCase. ant clean test -Dtestcase=TestIn ant clean test -Dtestcase=TestCase Thanks, Cheolsoo Park
[jira] [Updated] (PIG-3374) CASE and IN fail when expression includes dereferencing operator
[ https://issues.apache.org/jira/browse/PIG-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3374: --- Attachment: PIG-3374-4.patch I forgot to update AliasMasker.g in the previous patch, so I included the update in a new patch. I also discovered that PIG-3342 didn't update AliasMasker.g and included the change in the new patch. > CASE and IN fail when expression includes dereferencing operator > > > Key: PIG-3374 > URL: https://issues.apache.org/jira/browse/PIG-3374 > Project: Pig > Issue Type: Bug > Components: parser >Reporter: Cheolsoo Park >Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3374-2.patch, PIG-3374-3.patch, PIG-3374-4.patch, > PIG-3374.patch > > > This is another bug that I discovered after deploying CASE/IN expressions > internally. > The current implementation of CASE/IN expression assumes that the 1st operand > is a single expression. But this is not true, for example, if it contains a > dereferencing operator. The following example demonstrates the problem: > {code} > A = LOAD 'foo' AS (k1:chararray, k2:chararray, v:int); > B = GROUP A BY (k1, k2); > C = FILTER B BY group.k1 IN ('a', 'b'); > DUMP C; > {code} > This fails with the following error: > {code} > Caused by: java.lang.IndexOutOfBoundsException: Index: 5, Size: 5 > at java.util.ArrayList.RangeCheck(ArrayList.java:547) > at java.util.ArrayList.get(ArrayList.java:322) > at > org.apache.pig.parser.LogicalPlanGenerator.in_eval(LogicalPlanGenerator.java:8624) > at > org.apache.pig.parser.LogicalPlanGenerator.cond(LogicalPlanGenerator.java:8405) > at > org.apache.pig.parser.LogicalPlanGenerator.filter_clause(LogicalPlanGenerator.java:7564) > at > org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1403) > at > org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:821) > at > org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:539) > at > org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:414) > at > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:181) > {code} > Here is the relavant code that causes trouble: > {code:title=QueryParser.g} > if(tree.getType() == IN) { > Tree lhs = tree.getChild(0); // lhs is not a single node! > for(int i = 2; i < tree.getChildCount(); i = i + 2) { > tree.insertChild(i, deepCopy(lhs)); > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3373) XMLLoader returns non-matching nodes when a tag name spans through the block boundary
[ https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3373: --- Status: Open (was: Patch Available) Canceling the patch waiting for response. > XMLLoader returns non-matching nodes when a tag name spans through the block > boundary > - > > Key: PIG-3373 > URL: https://issues.apache.org/jira/browse/PIG-3373 > Project: Pig > Issue Type: Bug > Components: piggybank >Reporter: Ahmed Eldawy >Assignee: Ahmed Eldawy > Labels: patch > Attachments: PIG3373.patch > > > When node start tag spans two blocks this tag is returned even if it is not > of the type. > Example: For the following input file > > BLOCK BOUNDARY > entually id="dfasd"> > XMLoader with tag type 'event' should return only the first one but it > actually returns both of them -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3380) patch to fix existing failures due to test related issues
[ https://issues.apache.org/jira/browse/PIG-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Annie Lin updated PIG-3380: --- Attachment: BUG-6364843.patch patch to fix existing e2e failures due to test related problem. > patch to fix existing failures due to test related issues > - > > Key: PIG-3380 > URL: https://issues.apache.org/jira/browse/PIG-3380 > Project: Pig > Issue Type: Bug > Components: e2e harness >Affects Versions: 0.11.2 >Reporter: Annie Lin >Assignee: Rohini Palaniswamy >Priority: Minor > Fix For: 0.11.2 > > Attachments: BUG-6364843.patch > > > attached is the patch created from > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/ > two conf files are modified: > nightly.conf > turing_jython.conf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3380) patch to fix existing failures due to test related issues
Annie Lin created PIG-3380: -- Summary: patch to fix existing failures due to test related issues Key: PIG-3380 URL: https://issues.apache.org/jira/browse/PIG-3380 Project: Pig Issue Type: Bug Components: e2e harness Affects Versions: 0.11.2 Reporter: Annie Lin Assignee: Rohini Palaniswamy Priority: Minor Fix For: 0.11.2 attached is the patch created from http://svn.apache.org/repos/asf/pig/branches/branch-0.11/ two conf files are modified: nightly.conf turing_jython.conf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (18 issues) Subscriber: pigdaily Key Summary PIG-3374CASE and IN fail when expression includes dereferencing operator https://issues.apache.org/jira/browse/PIG-3374 PIG-3373XMLLoader returns non-matching nodes when a tag name spans through the block boundary https://issues.apache.org/jira/browse/PIG-3373 PIG-3359Register Statements and Param Substitution in Macros https://issues.apache.org/jira/browse/PIG-3359 PIG-3346New property that controls the number of combined splits https://issues.apache.org/jira/browse/PIG-3346 PIG-Fix remaining Windows core unit test failures https://issues.apache.org/jira/browse/PIG- PIG-3295Casting from bytearray failing after Union (even when each field is from a single Loader) https://issues.apache.org/jira/browse/PIG-3295 PIG-3292Logical plan invalid state: duplicate uid in schema during self-join to get cross product https://issues.apache.org/jira/browse/PIG-3292 PIG-3288Kill jobs if the number of output files is over a configurable limit https://issues.apache.org/jira/browse/PIG-3288 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3247Piggybank functions to mimic OVER clause in SQL https://issues.apache.org/jira/browse/PIG-3247 PIG-3210Pig fails to start when it cannot write log to log files https://issues.apache.org/jira/browse/PIG-3210 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3166Update eclipse .classpath according to ivy library.properties https://issues.apache.org/jira/browse/PIG-3166 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3021Split results missing records when there is null values in the column comparison https://issues.apache.org/jira/browse/PIG-3021 PIG-2248Pig parser does not detect when a macro name masks a UDF name https://issues.apache.org/jira/browse/PIG-2248 PIG-1914Support load/store JSON data in Pig https://issues.apache.org/jira/browse/PIG-1914 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Commented] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail
[ https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710085#comment-13710085 ] Xuefu Zhang commented on PIG-3379: -- It seems related to PIG-1271 and PIG-2530, but both were marked as fixed. > Alias reuse in nested foreach causes PIG script to fail > --- > > Key: PIG-3379 > URL: https://issues.apache.org/jira/browse/PIG-3379 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > > The following script fails: > {code:title=temp.pig} > Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, > eventName:chararray); > Events = FOREACH Events GENERATE eventTime, deviceId, eventName; > EventsPerMinute = GROUP Events BY (eventTime / 6); > EventsPerMinute = FOREACH EventsPerMinute { > DistinctDevices = DISTINCT Events.deviceId; > nbDevices = SIZE(DistinctDevices); > DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; > nbDevicesWatching = SIZE(DistinctDevices); > GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching > as nbDevicesWatching; > } > EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < > 10; > A = FOREACH EventsPerMinute GENERATE timeStamp; > describe A; > {code} > With the error: > {code} > 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1025: > Invalid field > projection. Projected field [timeStamp] does not exist in schema: > deviceId:chararray. > {code} > Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As > an observation, removing the last filter statement also fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail
[ https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-3379: - Description: The following script fails: {code:title=temp.pig} Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray); Events = FOREACH Events GENERATE eventTime, deviceId, eventName; EventsPerMinute = GROUP Events BY (eventTime / 6); EventsPerMinute = FOREACH EventsPerMinute { DistinctDevices = DISTINCT Events.deviceId; nbDevices = SIZE(DistinctDevices); DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; nbDevicesWatching = SIZE(DistinctDevices); GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching; } EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 10; A = FOREACH EventsPerMinute GENERATE timeStamp; describe A; {code} With the error: {code} 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray. {code} Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem. was: The following script fails: {code:title=temp.pig} Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray); Events = FOREACH Events GENERATE eventTime, deviceId, eventName; EventsPerMinute = GROUP Events BY (eventTime / 6); EventsPerMinute = FOREACH EventsPerMinute { DistinctDevices = DISTINCT Events.deviceId; nbDevices = SIZE(DistinctDevices); DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; nbDevicesWatching = SIZE(DistinctDevices); GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching; } EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 10; A = FOREACH EventsPerMinute GENERATE timeStamp; describe A; {code} With the error: 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray. Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem. > Alias reuse in nested foreach causes PIG script to fail > --- > > Key: PIG-3379 > URL: https://issues.apache.org/jira/browse/PIG-3379 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > > The following script fails: > {code:title=temp.pig} > Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, > eventName:chararray); > Events = FOREACH Events GENERATE eventTime, deviceId, eventName; > EventsPerMinute = GROUP Events BY (eventTime / 6); > EventsPerMinute = FOREACH EventsPerMinute { > DistinctDevices = DISTINCT Events.deviceId; > nbDevices = SIZE(DistinctDevices); > DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; > nbDevicesWatching = SIZE(DistinctDevices); > GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching > as nbDevicesWatching; > } > EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < > 10; > A = FOREACH EventsPerMinute GENERATE timeStamp; > describe A; > {code} > With the error: > {code} > 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1025: > Invalid field > projection. Projected field [timeStamp] does not exist in schema: > deviceId:chararray. > {code} > Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As > an observation, removing the last filter statement also fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail
[ https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-3379: - Description: The following script fails: {code:title=temp.pig} Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray); Events = FOREACH Events GENERATE eventTime, deviceId, eventName; EventsPerMinute = GROUP Events BY (eventTime / 6); EventsPerMinute = FOREACH EventsPerMinute { DistinctDevices = DISTINCT Events.deviceId; nbDevices = SIZE(DistinctDevices); DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; nbDevicesWatching = SIZE(DistinctDevices); GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching; } EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 10; A = FOREACH EventsPerMinute GENERATE timeStamp; describe A; {code} With the error: 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray. Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem. was: The following script fails: {{ Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray); Events = FOREACH Events GENERATE eventTime, deviceId, eventName; EventsPerMinute = GROUP Events BY (eventTime / 6); EventsPerMinute = FOREACH EventsPerMinute { DistinctDevices = DISTINCT Events.deviceId; nbDevices = SIZE(DistinctDevices); DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; nbDevicesWatching = SIZE(DistinctDevices); GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching; } EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 10; A = FOREACH EventsPerMinute GENERATE timeStamp; describe A; }} With the error: 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray. Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem. > Alias reuse in nested foreach causes PIG script to fail > --- > > Key: PIG-3379 > URL: https://issues.apache.org/jira/browse/PIG-3379 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > > The following script fails: > {code:title=temp.pig} > Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, > eventName:chararray); > Events = FOREACH Events GENERATE eventTime, deviceId, eventName; > EventsPerMinute = GROUP Events BY (eventTime / 6); > EventsPerMinute = FOREACH EventsPerMinute { > DistinctDevices = DISTINCT Events.deviceId; > nbDevices = SIZE(DistinctDevices); > DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; > nbDevicesWatching = SIZE(DistinctDevices); > GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching > as nbDevicesWatching; > } > EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < > 10; > A = FOREACH EventsPerMinute GENERATE timeStamp; > describe A; > {code} > With the error: > 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1025: > Invalid field > projection. Projected field [timeStamp] does not exist in schema: > deviceId:chararray. > Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As > an observation, removing the last filter statement also fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail
Xuefu Zhang created PIG-3379: Summary: Alias reuse in nested foreach causes PIG script to fail Key: PIG-3379 URL: https://issues.apache.org/jira/browse/PIG-3379 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.11.1 Reporter: Xuefu Zhang Assignee: Xuefu Zhang The following script fails: Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray); Events = FOREACH Events GENERATE eventTime, deviceId, eventName; EventsPerMinute = GROUP Events BY (eventTime / 6); EventsPerMinute = FOREACH EventsPerMinute { DistinctDevices = DISTINCT Events.deviceId; nbDevices = SIZE(DistinctDevices); DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; nbDevicesWatching = SIZE(DistinctDevices); GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching; } EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 10; A = FOREACH EventsPerMinute GENERATE timeStamp; describe A; With the error: 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray. Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail
[ https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-3379: - Description: The following script fails: {{ Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray); Events = FOREACH Events GENERATE eventTime, deviceId, eventName; EventsPerMinute = GROUP Events BY (eventTime / 6); EventsPerMinute = FOREACH EventsPerMinute { DistinctDevices = DISTINCT Events.deviceId; nbDevices = SIZE(DistinctDevices); DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; nbDevicesWatching = SIZE(DistinctDevices); GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching; } EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 10; A = FOREACH EventsPerMinute GENERATE timeStamp; describe A; }} With the error: 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray. Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem. was: The following script fails: Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray); Events = FOREACH Events GENERATE eventTime, deviceId, eventName; EventsPerMinute = GROUP Events BY (eventTime / 6); EventsPerMinute = FOREACH EventsPerMinute { DistinctDevices = DISTINCT Events.deviceId; nbDevices = SIZE(DistinctDevices); DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; nbDevicesWatching = SIZE(DistinctDevices); GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as nbDevicesWatching; } EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < 10; A = FOREACH EventsPerMinute GENERATE timeStamp; describe A; With the error: 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: Invalid field projection. Projected field [timeStamp] does not exist in schema: deviceId:chararray. Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As an observation, removing the last filter statement also fixes the problem. > Alias reuse in nested foreach causes PIG script to fail > --- > > Key: PIG-3379 > URL: https://issues.apache.org/jira/browse/PIG-3379 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.11.1 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > > The following script fails: > {{ > Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, > eventName:chararray); > Events = FOREACH Events GENERATE eventTime, deviceId, eventName; > EventsPerMinute = GROUP Events BY (eventTime / 6); > EventsPerMinute = FOREACH EventsPerMinute { > DistinctDevices = DISTINCT Events.deviceId; > nbDevices = SIZE(DistinctDevices); > DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat'; > nbDevicesWatching = SIZE(DistinctDevices); > GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching > as nbDevicesWatching; > } > EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0 AND timeStamp < > 10; > A = FOREACH EventsPerMinute GENERATE timeStamp; > describe A; > }} > With the error: > 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1025: > Invalid field > projection. Projected field [timeStamp] does not exist in schema: > deviceId:chararray. > Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As > an observation, removing the last filter statement also fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3372) test
[ https://issues.apache.org/jira/browse/PIG-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved PIG-3372. - Resolution: Invalid > test > > > Key: PIG-3372 > URL: https://issues.apache.org/jira/browse/PIG-3372 > Project: Pig > Issue Type: Test > Components: impl >Reporter: Manuel >Priority: Trivial > > test -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Is there any Conditional Statement in Apache Pig like If/Else
http://pig.apache.org/docs/r0.11.1/cont.html On Tue, Jul 16, 2013 at 3:34 PM, Bhavesh K Shah < bhavesh.s...@bitwiseglobal.com> wrote: > Hello All, > > > > I am newbie to Apache Pig and I am exploring it for my one of the use case. > > Actually I am writing PIG Script and want to execute some set of > statements if one of the condition is satisfied. > > I have set one variable to some value. I want to implement like below: > > > > if flag==0 then > > A = LOAD 'file' using PigStorage() as (f1:int, ); > > B = ...; > > C = ; > > else > > again some Pig Latin statements > > > > Can I do this in PIG Script? If yes, then how can I do this? > > > > Also I came across conditional operator in Pig like (a == b ? c1 : c2);. > But How can I insert bulk of Pig Statements in between operator? > > > > > > Thanks. > > Bhavesh Shah > > **Disclaimer** > This e-mail message and any attachments may contain confidential > information and is for the sole use of the intended recipient(s) only. Any > views or opinions presented or implied are solely those of the author and > do not necessarily represent the views of BitWise. If you are not the > intended recipient(s), you are hereby notified that disclosure, printing, > copying, forwarding, distribution, or the taking of any action whatsoever > in reliance on the contents of this electronic information is strictly > prohibited. If you have received this e-mail message in error, please > immediately notify the sender and delete the electronic message and any > attachments.BitWise does not accept liability for any virus introduced by > this e-mail or any attachments. > >
[jira] [Created] (PIG-3378) Pig streaming with multiquery is buggy in local mode.
Thomas Porez created PIG-3378: - Summary: Pig streaming with multiquery is buggy in local mode. Key: PIG-3378 URL: https://issues.apache.org/jira/browse/PIG-3378 Project: Pig Issue Type: Bug Affects Versions: 0.11.1 Reporter: Thomas Porez Priority: Minor I realize today a strange behavior of PIG in local mode (streaming + multiquery). I put here a minimal script to reproduce the problem. Suppose an input file with multiple lines for example: # myInput 1 2 3 1 2 3 The pig script is : # bug.pig MYINPUT = LOAD 'myinput'; A = GROUP MYINPUT BY $0; B = FOREACH A GENERATE FLATTEN(MYINPUT); C = STREAM B THROUGH `cat`; D = GROUP MYINPUT BY $0; E = FOREACH D GENERATE FLATTEN(MYINPUT); F = STREAM E THROUGH `cat`; STORE C into 'output1'; STORE F into 'output2'; I run the script using the following command: pig -x local bug.pig We should find in output1 and output2 perfect copy of my input file ... but this is not the case. We find only one line (the first line of the file) cat output1/part* cat output2/part* For information : * The corresponding pig script in hadoop mode work properly. * If I comment one of the two store operation, it works as expected (that's why I think it's because on multiquery is run). * If y put an EXEC statement between the two STORE operations, it works too. * I can assure the script reads well all lines of stdin. For example, changing the executable `cat` with `wc-l`, we find out the number of rows of input file. So it seems that the problem is the parsing of stdout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Is there any Conditional Statement in Apache Pig like If/Else
Hello All, I am newbie to Apache Pig and I am exploring it for my one of the use case. Actually I am writing PIG Script and want to execute some set of statements if one of the condition is satisfied. I have set one variable to some value. I want to implement like below: if flag==0 then A = LOAD 'file' using PigStorage() as (f1:int, ); B = ...; C = ; else again some Pig Latin statements Can I do this in PIG Script? If yes, then how can I do this? Also I came across conditional operator in Pig like (a == b ? c1 : c2);. But How can I insert bulk of Pig Statements in between operator? Thanks. Bhavesh Shah **Disclaimer** This e-mail message and any attachments may contain confidential information and is for the sole use of the intended recipient(s) only. Any views or opinions presented or implied are solely those of the author and do not necessarily represent the views of BitWise. If you are not the intended recipient(s), you are hereby notified that disclosure, printing, copying, forwarding, distribution, or the taking of any action whatsoever in reliance on the contents of this electronic information is strictly prohibited. If you have received this e-mail message in error, please immediately notify the sender and delete the electronic message and any attachments.BitWise does not accept liability for any virus introduced by this e-mail or any attachments.