[jira] [Updated] (PIG-3288) Kill jobs if the number of output files is over a configurable limit

2013-07-16 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3288:
---

Status: Open  (was: Patch Available)

[~aniket486], your suggestion makes a lot of sense, and I like it. Let me think 
about this more. Canceling the patch for now.

> Kill jobs if the number of output files is over a configurable limit
> 
>
> Key: PIG-3288
> URL: https://issues.apache.org/jira/browse/PIG-3288
> Project: Pig
>  Issue Type: Wish
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3288-2.patch, PIG-3288-3.patch, PIG-3288-4.patch, 
> PIG-3288-5.patch, PIG-3288.patch
>
>
> I ran into a situation where a Pig job tried to create too many files on hdfs 
> and overloaded NN. To prevent such events, it would be nice if we could set a 
> upper limit on the number of files that a Pig job can create.
> In fact, Hive has a property called "hive.exec.max.created.files". The idea 
> is that each mapper/reducer increases a counter every time when they create 
> files. Then, MRLauncher periodically checks whether the number of created 
> files so far has exceeded the upper limit. If so, we kill running jobs and 
> exit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 12290: CASE and IN fail when expression includes dereferencing operator

2013-07-16 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12290/
---

(Updated July 17, 2013, 5:49 a.m.)


Review request for pig.


Changes
---

Updated AliasMasker.g.


Bugs: PIG-3374
https://issues.apache.org/jira/browse/PIG-3374


Repository: pig-git


Description
---

See PIG-3374 for details.


Diffs (updated)
-

  src/org/apache/pig/parser/AliasMasker.g 98d94f7 
  src/org/apache/pig/parser/AstPrinter.g d87 
  src/org/apache/pig/parser/AstValidator.g d0ed0e8 
  src/org/apache/pig/parser/LogicalPlanGenerator.g cc1f47e 
  src/org/apache/pig/parser/QueryParser.g d4d9700 
  test/org/apache/pig/test/TestCase.java 5d8f7f3 
  test/org/apache/pig/test/TestIn.java c3a55de 

Diff: https://reviews.apache.org/r/12290/diff/


Testing
---

Added new test cases to TestIn and TestCase.

ant clean test -Dtestcase=TestIn
ant clean test -Dtestcase=TestCase


Thanks,

Cheolsoo Park



[jira] [Updated] (PIG-3374) CASE and IN fail when expression includes dereferencing operator

2013-07-16 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3374:
---

Attachment: PIG-3374-4.patch

I forgot to update AliasMasker.g in the previous patch, so I included the 
update in a new patch.

I also discovered that PIG-3342 didn't update AliasMasker.g and included the 
change in the new patch.

> CASE and IN fail when expression includes dereferencing operator
> 
>
> Key: PIG-3374
> URL: https://issues.apache.org/jira/browse/PIG-3374
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
> Fix For: 0.12
>
> Attachments: PIG-3374-2.patch, PIG-3374-3.patch, PIG-3374-4.patch, 
> PIG-3374.patch
>
>
> This is another bug that I discovered after deploying CASE/IN expressions 
> internally.
> The current implementation of CASE/IN expression assumes that the 1st operand 
> is a single expression. But this is not true, for example, if it contains a 
> dereferencing operator. The following example demonstrates the problem:
> {code}
> A = LOAD 'foo' AS (k1:chararray, k2:chararray, v:int);
> B = GROUP A BY (k1, k2);
> C = FILTER B BY group.k1 IN ('a', 'b');
> DUMP C;
> {code}
> This fails with the following error:
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 5, Size: 5
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> at java.util.ArrayList.get(ArrayList.java:322)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.in_eval(LogicalPlanGenerator.java:8624)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.cond(LogicalPlanGenerator.java:8405)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.filter_clause(LogicalPlanGenerator.java:7564)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1403)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:821)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:539)
> at 
> org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:414)
> at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:181)
> {code}
> Here is the relavant code that causes trouble:
> {code:title=QueryParser.g}
> if(tree.getType() == IN) {
>   Tree lhs = tree.getChild(0); // lhs is not a single node!
>   for(int i = 2; i < tree.getChildCount(); i = i + 2) {
> tree.insertChild(i, deepCopy(lhs));
>   }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3373) XMLLoader returns non-matching nodes when a tag name spans through the block boundary

2013-07-16 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3373:
---

Status: Open  (was: Patch Available)

Canceling the patch waiting for response.

> XMLLoader returns non-matching nodes when a tag name spans through the block 
> boundary
> -
>
> Key: PIG-3373
> URL: https://issues.apache.org/jira/browse/PIG-3373
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Reporter: Ahmed Eldawy
>Assignee: Ahmed Eldawy
>  Labels: patch
> Attachments: PIG3373.patch
>
>
> When node start tag spans two blocks this tag is returned even if it is not 
> of the type.
> Example: For the following input file
> 
>   BLOCK BOUNDARY
> entually id="dfasd">
> XMLoader with tag type 'event' should return only the first one but it 
> actually returns both of them

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3380) patch to fix existing failures due to test related issues

2013-07-16 Thread Annie Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Annie Lin updated PIG-3380:
---

Attachment: BUG-6364843.patch

patch to fix existing e2e failures due to test related problem.

> patch to fix existing failures due to test related issues
> -
>
> Key: PIG-3380
> URL: https://issues.apache.org/jira/browse/PIG-3380
> Project: Pig
>  Issue Type: Bug
>  Components: e2e harness
>Affects Versions: 0.11.2
>Reporter: Annie Lin
>Assignee: Rohini Palaniswamy
>Priority: Minor
> Fix For: 0.11.2
>
> Attachments: BUG-6364843.patch
>
>
> attached is the patch created from
> http://svn.apache.org/repos/asf/pig/branches/branch-0.11/
> two conf files are modified:
>   nightly.conf
>   turing_jython.conf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3380) patch to fix existing failures due to test related issues

2013-07-16 Thread Annie Lin (JIRA)
Annie Lin created PIG-3380:
--

 Summary: patch to fix existing failures due to test related issues
 Key: PIG-3380
 URL: https://issues.apache.org/jira/browse/PIG-3380
 Project: Pig
  Issue Type: Bug
  Components: e2e harness
Affects Versions: 0.11.2
Reporter: Annie Lin
Assignee: Rohini Palaniswamy
Priority: Minor
 Fix For: 0.11.2


attached is the patch created from
http://svn.apache.org/repos/asf/pig/branches/branch-0.11/

two conf files are modified:
  nightly.conf
  turing_jython.conf



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-07-16 Thread jira
Issue Subscription
Filter: PIG patch available (18 issues)

Subscriber: pigdaily

Key Summary
PIG-3374CASE and IN fail when expression includes dereferencing operator
https://issues.apache.org/jira/browse/PIG-3374
PIG-3373XMLLoader returns non-matching nodes when a tag name spans through 
the block boundary
https://issues.apache.org/jira/browse/PIG-3373
PIG-3359Register Statements and Param Substitution in Macros
https://issues.apache.org/jira/browse/PIG-3359
PIG-3346New property that controls the number of combined splits
https://issues.apache.org/jira/browse/PIG-3346
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3292Logical plan invalid state: duplicate uid in schema during 
self-join to get cross product
https://issues.apache.org/jira/browse/PIG-3292
PIG-3288Kill jobs if the number of output files is over a configurable limit
https://issues.apache.org/jira/browse/PIG-3288
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3021Split results missing records when there is null values in the 
column comparison
https://issues.apache.org/jira/browse/PIG-3021
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail

2013-07-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710085#comment-13710085
 ] 

Xuefu Zhang commented on PIG-3379:
--

It seems related to PIG-1271 and PIG-2530, but both were marked as fixed.

> Alias reuse in nested foreach causes PIG script to fail
> ---
>
> Key: PIG-3379
> URL: https://issues.apache.org/jira/browse/PIG-3379
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11.1
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> The following script fails:
> {code:title=temp.pig}
> Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, 
> eventName:chararray);
> Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
> EventsPerMinute = GROUP Events BY (eventTime / 6);
> EventsPerMinute = FOREACH EventsPerMinute {
>   DistinctDevices = DISTINCT Events.deviceId;
>   nbDevices = SIZE(DistinctDevices);
>   DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
>   nbDevicesWatching = SIZE(DistinctDevices);
>   GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching 
> as nbDevicesWatching;
> }
> EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
> 10;
> A = FOREACH EventsPerMinute GENERATE timeStamp;
> describe A;
> {code}
> With the error:
> {code}
> 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1025: 
>  Invalid field 
> projection. Projected field [timeStamp] does not exist in schema: 
> deviceId:chararray.
> {code}
> Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
> an observation, removing the last filter statement also fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail

2013-07-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-3379:
-

Description: 
The following script fails:
{code:title=temp.pig}
Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
EventsPerMinute = GROUP Events BY (eventTime / 6);
EventsPerMinute = FOREACH EventsPerMinute {
  DistinctDevices = DISTINCT Events.deviceId;
  nbDevices = SIZE(DistinctDevices);

  DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
  nbDevicesWatching = SIZE(DistinctDevices);

  GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as 
nbDevicesWatching;
}
EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
10;
A = FOREACH EventsPerMinute GENERATE timeStamp;
describe A;
{code}
With the error:
{code}
2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1025: 
 Invalid field 
projection. Projected field [timeStamp] does not exist in schema: 
deviceId:chararray.
{code}
Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
an observation, removing the last filter statement also fixes the problem.


  was:
The following script fails:
{code:title=temp.pig}
Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
EventsPerMinute = GROUP Events BY (eventTime / 6);
EventsPerMinute = FOREACH EventsPerMinute {
  DistinctDevices = DISTINCT Events.deviceId;
  nbDevices = SIZE(DistinctDevices);

  DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
  nbDevicesWatching = SIZE(DistinctDevices);

  GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as 
nbDevicesWatching;
}
EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
10;
A = FOREACH EventsPerMinute GENERATE timeStamp;
describe A;
{code}
With the error:

2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1025: 
 Invalid field 
projection. Projected field [timeStamp] does not exist in schema: 
deviceId:chararray.

Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
an observation, removing the last filter statement also fixes the problem.



> Alias reuse in nested foreach causes PIG script to fail
> ---
>
> Key: PIG-3379
> URL: https://issues.apache.org/jira/browse/PIG-3379
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11.1
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> The following script fails:
> {code:title=temp.pig}
> Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, 
> eventName:chararray);
> Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
> EventsPerMinute = GROUP Events BY (eventTime / 6);
> EventsPerMinute = FOREACH EventsPerMinute {
>   DistinctDevices = DISTINCT Events.deviceId;
>   nbDevices = SIZE(DistinctDevices);
>   DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
>   nbDevicesWatching = SIZE(DistinctDevices);
>   GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching 
> as nbDevicesWatching;
> }
> EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
> 10;
> A = FOREACH EventsPerMinute GENERATE timeStamp;
> describe A;
> {code}
> With the error:
> {code}
> 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1025: 
>  Invalid field 
> projection. Projected field [timeStamp] does not exist in schema: 
> deviceId:chararray.
> {code}
> Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
> an observation, removing the last filter statement also fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail

2013-07-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-3379:
-

Description: 
The following script fails:
{code:title=temp.pig}
Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
EventsPerMinute = GROUP Events BY (eventTime / 6);
EventsPerMinute = FOREACH EventsPerMinute {
  DistinctDevices = DISTINCT Events.deviceId;
  nbDevices = SIZE(DistinctDevices);

  DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
  nbDevicesWatching = SIZE(DistinctDevices);

  GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as 
nbDevicesWatching;
}
EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
10;
A = FOREACH EventsPerMinute GENERATE timeStamp;
describe A;
{code}
With the error:

2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1025: 
 Invalid field 
projection. Projected field [timeStamp] does not exist in schema: 
deviceId:chararray.

Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
an observation, removing the last filter statement also fixes the problem.


  was:
The following script fails:
{{
Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
EventsPerMinute = GROUP Events BY (eventTime / 6);
EventsPerMinute = FOREACH EventsPerMinute {
  DistinctDevices = DISTINCT Events.deviceId;
  nbDevices = SIZE(DistinctDevices);

  DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
  nbDevicesWatching = SIZE(DistinctDevices);

  GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as 
nbDevicesWatching;
}
EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
10;
A = FOREACH EventsPerMinute GENERATE timeStamp;
describe A;
}}
With the error:

2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1025: 
 Invalid field 
projection. Projected field [timeStamp] does not exist in schema: 
deviceId:chararray.

Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
an observation, removing the last filter statement also fixes the problem.



> Alias reuse in nested foreach causes PIG script to fail
> ---
>
> Key: PIG-3379
> URL: https://issues.apache.org/jira/browse/PIG-3379
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11.1
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> The following script fails:
> {code:title=temp.pig}
> Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, 
> eventName:chararray);
> Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
> EventsPerMinute = GROUP Events BY (eventTime / 6);
> EventsPerMinute = FOREACH EventsPerMinute {
>   DistinctDevices = DISTINCT Events.deviceId;
>   nbDevices = SIZE(DistinctDevices);
>   DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
>   nbDevicesWatching = SIZE(DistinctDevices);
>   GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching 
> as nbDevicesWatching;
> }
> EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
> 10;
> A = FOREACH EventsPerMinute GENERATE timeStamp;
> describe A;
> {code}
> With the error:
> 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1025: 
>  Invalid field 
> projection. Projected field [timeStamp] does not exist in schema: 
> deviceId:chararray.
> Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
> an observation, removing the last filter statement also fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail

2013-07-16 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created PIG-3379:


 Summary: Alias reuse in nested foreach causes PIG script to fail
 Key: PIG-3379
 URL: https://issues.apache.org/jira/browse/PIG-3379
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.11.1
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


The following script fails:

Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
EventsPerMinute = GROUP Events BY (eventTime / 6);
EventsPerMinute = FOREACH EventsPerMinute {
  DistinctDevices = DISTINCT Events.deviceId;
  nbDevices = SIZE(DistinctDevices);

  DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
  nbDevicesWatching = SIZE(DistinctDevices);

  GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as 
nbDevicesWatching;
}
EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
10;
A = FOREACH EventsPerMinute GENERATE timeStamp;
describe A;

With the error:

2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1025: 
 Invalid field 
projection. Projected field [timeStamp] does not exist in schema: 
deviceId:chararray.

Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
an observation, removing the last filter statement also fixes the problem.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3379) Alias reuse in nested foreach causes PIG script to fail

2013-07-16 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-3379:
-

Description: 
The following script fails:
{{
Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
EventsPerMinute = GROUP Events BY (eventTime / 6);
EventsPerMinute = FOREACH EventsPerMinute {
  DistinctDevices = DISTINCT Events.deviceId;
  nbDevices = SIZE(DistinctDevices);

  DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
  nbDevicesWatching = SIZE(DistinctDevices);

  GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as 
nbDevicesWatching;
}
EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
10;
A = FOREACH EventsPerMinute GENERATE timeStamp;
describe A;
}}
With the error:

2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1025: 
 Invalid field 
projection. Projected field [timeStamp] does not exist in schema: 
deviceId:chararray.

Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
an observation, removing the last filter statement also fixes the problem.


  was:
The following script fails:

Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, eventName:chararray);
Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
EventsPerMinute = GROUP Events BY (eventTime / 6);
EventsPerMinute = FOREACH EventsPerMinute {
  DistinctDevices = DISTINCT Events.deviceId;
  nbDevices = SIZE(DistinctDevices);

  DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
  nbDevicesWatching = SIZE(DistinctDevices);

  GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching as 
nbDevicesWatching;
}
EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
10;
A = FOREACH EventsPerMinute GENERATE timeStamp;
describe A;

With the error:

2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1025: 
 Invalid field 
projection. Projected field [timeStamp] does not exist in schema: 
deviceId:chararray.

Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
an observation, removing the last filter statement also fixes the problem.



> Alias reuse in nested foreach causes PIG script to fail
> ---
>
> Key: PIG-3379
> URL: https://issues.apache.org/jira/browse/PIG-3379
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11.1
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> The following script fails:
> {{
> Events = LOAD 'x' AS (eventTime:long, deviceId:chararray, 
> eventName:chararray);
> Events = FOREACH Events GENERATE eventTime, deviceId, eventName;
> EventsPerMinute = GROUP Events BY (eventTime / 6);
> EventsPerMinute = FOREACH EventsPerMinute {
>   DistinctDevices = DISTINCT Events.deviceId;
>   nbDevices = SIZE(DistinctDevices);
>   DistinctDevices = FILTER Events BY eventName == 'xuaHeartBeat';
>   nbDevicesWatching = SIZE(DistinctDevices);
>   GENERATE $0*6 as timeStamp, nbDevices as nbDevices, nbDevicesWatching 
> as nbDevicesWatching;
> }
> EventsPerMinute = FILTER EventsPerMinute BY timeStamp >= 0  AND timeStamp < 
> 10;
> A = FOREACH EventsPerMinute GENERATE timeStamp;
> describe A;
> }}
> With the error:
> 2013-07-16 11:31:20,450 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1025: 
>  Invalid field 
> projection. Projected field [timeStamp] does not exist in schema: 
> deviceId:chararray.
> Using distinct alias name for the 2nd "DistinctDevices" fixes the problem. As 
> an observation, removing the last filter statement also fixes the problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3372) test

2013-07-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-3372.
-

Resolution: Invalid

> test
> 
>
> Key: PIG-3372
> URL: https://issues.apache.org/jira/browse/PIG-3372
> Project: Pig
>  Issue Type: Test
>  Components: impl
>Reporter: Manuel
>Priority: Trivial
>
> test

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Is there any Conditional Statement in Apache Pig like If/Else

2013-07-16 Thread yu xiang
http://pig.apache.org/docs/r0.11.1/cont.html


On Tue, Jul 16, 2013 at 3:34 PM, Bhavesh K Shah <
bhavesh.s...@bitwiseglobal.com> wrote:

> Hello All,
>
>
>
> I am newbie to Apache Pig and I am exploring it for my one of the use case.
>
> Actually I am writing PIG Script and want to execute some set of
> statements if one of the condition is satisfied.
>
> I have set one variable to some value.  I want to implement like below:
>
>
>
> if flag==0 then
>
>   A = LOAD 'file' using PigStorage() as (f1:int, );
>
>   B = ...;
>
>   C = ;
>
> else
>
>   again some Pig Latin statements
>
>
>
> Can I do this in PIG Script? If yes, then how can I do this?
>
>
>
> Also I came across conditional operator in Pig like (a == b ? c1 : c2);.
> But How can I insert bulk of Pig Statements in between operator?
>
>
>
>
>
> Thanks.
>
> Bhavesh Shah
>
> **Disclaimer**
> This e-mail message and any attachments may contain confidential
> information and is for the sole use of the intended recipient(s) only. Any
> views or opinions presented or implied are solely those of the author and
> do not necessarily represent the views of BitWise. If you are not the
> intended recipient(s), you are hereby notified that disclosure, printing,
> copying, forwarding, distribution, or the taking of any action whatsoever
> in reliance on the contents of this electronic information is strictly
> prohibited. If you have received this e-mail message in error, please
> immediately notify the sender and delete the electronic message and any
> attachments.BitWise does not accept liability for any virus introduced by
> this e-mail or any attachments.
> 
>


[jira] [Created] (PIG-3378) Pig streaming with multiquery is buggy in local mode.

2013-07-16 Thread Thomas Porez (JIRA)
Thomas Porez created PIG-3378:
-

 Summary: Pig streaming with multiquery is buggy in local mode.
 Key: PIG-3378
 URL: https://issues.apache.org/jira/browse/PIG-3378
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Thomas Porez
Priority: Minor


I realize today a strange behavior of PIG in local mode (streaming + 
multiquery).
I put here a minimal script to reproduce the problem.

Suppose an input file with multiple lines for example:
# myInput
1
2
3
1
2
3

The pig script is :
# bug.pig
MYINPUT = LOAD 'myinput';

A = GROUP MYINPUT BY $0;
B = FOREACH A GENERATE FLATTEN(MYINPUT);
C = STREAM B THROUGH `cat`;

D = GROUP MYINPUT BY $0;
E = FOREACH D GENERATE FLATTEN(MYINPUT);
F = STREAM E THROUGH `cat`;

STORE C into 'output1';
STORE F into 'output2';

I run the script using the following command:
pig -x local bug.pig

We should find in output1 and output2 perfect copy of my input file ... but
this is not the case. We find only one line (the first line of the file)
cat output1/part*
cat output2/part*

For information : 
* The corresponding pig script in hadoop mode work properly.
* If I comment one of the two store operation, it works as expected (that's why 
I think it's because on multiquery is run).
* If y put an EXEC statement between the two STORE operations, it works too.
* I can assure the script reads well all lines of stdin. For example, changing 
the executable `cat` with `wc-l`, we find out the number of rows of input file.
So it seems that the problem is the parsing of stdout.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Is there any Conditional Statement in Apache Pig like If/Else

2013-07-16 Thread Bhavesh K Shah
Hello All,



I am newbie to Apache Pig and I am exploring it for my one of the use case.

Actually I am writing PIG Script and want to execute some set of statements if 
one of the condition is satisfied.

I have set one variable to some value.  I want to implement like below:



if flag==0 then

  A = LOAD 'file' using PigStorage() as (f1:int, );

  B = ...;

  C = ;

else

  again some Pig Latin statements



Can I do this in PIG Script? If yes, then how can I do this?



Also I came across conditional operator in Pig like (a == b ? c1 : c2);. But 
How can I insert bulk of Pig Statements in between operator?





Thanks.

Bhavesh Shah

**Disclaimer**
 This e-mail message and any attachments may contain confidential information 
and is for the sole use of the intended recipient(s) only. Any views or 
opinions presented or implied are solely those of the author and do not 
necessarily represent the views of BitWise. If you are not the intended 
recipient(s), you are hereby notified that disclosure, printing, copying, 
forwarding, distribution, or the taking of any action whatsoever in reliance on 
the contents of this electronic information is strictly prohibited. If you have 
received this e-mail message in error, please immediately notify the sender and 
delete the electronic message and any attachments.BitWise does not accept 
liability for any virus introduced by this e-mail or any attachments.