[jira] [Commented] (PIG-2751) Allow macros in FOREACH

2012-07-11 Thread Joshua Hartman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412502#comment-13412502
 ] 

Joshua Hartman commented on PIG-2751:
-

I have this exact use case for trying to do things like filter out nulls using 
a FOREACH. It's a pain to use a ternary operator every time - it would be much 
nicer to have a sort of getOrElse macro that can run.

> Allow macros in FOREACH
> ---
>
> Key: PIG-2751
> URL: https://issues.apache.org/jira/browse/PIG-2751
> Project: Pig
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 0.10.0
> Environment: Kubuntu 12.04 64Bit
>Reporter: Johannes Schwenk
>
> I would like to be able to use macros within the GENERATE of an FOREACH.
> Example:
> {code}
> define test_macro(param1, param2) returns ret_val {
>   $ret_val = (param1 == 0 ? param2 : param1);
> };
> a = LOAD ('data') AS (id, val1, val2);
> b = FOREACH a GENERATE id, test_macro(val1, val2);
> DUMP b;
> {code}
> This would be most useful for having only a single point to edit (the macro) 
> if a definition for a special computation changes. Lets say, you have raw log 
> data and several scripts loading it. All scripts need to filter out specific 
> unused columns. Most (but not all) of the scripts are dealing with a field 
> that needs to be handled in a special way. So I cannot just use two different 
> LOAD functions (one with the special computation and one without) because 
> that would make a second FOREACH ... GENERATE necessary to filter out the 
> unused columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2791) Can not enter grunt shell when using viewFS filesystem with CSMT and Federation

2012-07-11 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2791:


Attachment: PIG-2791-2.patch

> Can not enter grunt shell when using viewFS filesystem with CSMT and 
> Federation
> ---
>
> Key: PIG-2791
> URL: https://issues.apache.org/jira/browse/PIG-2791
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
> Environment: Pig QE
>Reporter: patrick white
>Priority: Blocker
> Attachments: PIG-2791-0.patch, PIG-2791-1.patch, PIG-2791-2.patch
>
>
> The Yahoo Pig QE team ran into a blocking issue when trying to test 
> Client-Side Mount Tables, on a Federated cluster with two NNs, this blocks 
> Pig Testing on Federation. 
> Federation relies strongly on the use of CSMT with viewFS, QE found that in 
> this configuration it is not possible to enter grunt shell because Pig makes 
> a call to getDefaultReplication() on the fs, which is ambiguous over viewFS 
> and causes core to throw a 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: "getDefaultReplication 
> on empty path is invalid".
> This in turn cause Pig to exit with an internal error as follows:
> 2012-07-06 22:20:25,657 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.1.0.1206081058 (r1348169) compiled Jun 08 2012, 17:58:42
> 2012-07-06 22:20:26,074 [main] WARN  org.apache.hadoop.conf.Configuration - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used
> 2012-07-06 22:20:26,076 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: viewfs:///
> 2012-07-06 22:20:26,080 [main] WARN  org.apache.hadoop.conf.Configuration - 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. getDefaultReplication on empty path is invalid
> 2012-07-06 22:20:26,522 [main] WARN  org.apache.pig.Main - There is no log 
> file to write to.
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication 
> on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:482)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:77)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:205)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:118)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:208)
> at org.apache.pig.PigServer.(PigServer.java:246)
> at org.apache.pig.PigServer.(PigServer.java:231)
> at org.apache.pig.tools.grunt.Grunt.(Grunt.java:47)
> at org.apache.pig.Main.run(Main.java:487)
> at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2791) Can not enter grunt shell when using viewFS filesystem with CSMT and Federation

2012-07-11 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2791:


Attachment: PIG-2791-1.patch

Added getDefaultBlockSize(Path) to the HadoopShims.

Had to use 2.0.0-alpha 
(http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/) as 0.23.1 
from maven did not have fs.getDefaultBlockSize(Path) api. 



> Can not enter grunt shell when using viewFS filesystem with CSMT and 
> Federation
> ---
>
> Key: PIG-2791
> URL: https://issues.apache.org/jira/browse/PIG-2791
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
> Environment: Pig QE
>Reporter: patrick white
>Priority: Blocker
> Attachments: PIG-2791-0.patch, PIG-2791-1.patch
>
>
> The Yahoo Pig QE team ran into a blocking issue when trying to test 
> Client-Side Mount Tables, on a Federated cluster with two NNs, this blocks 
> Pig Testing on Federation. 
> Federation relies strongly on the use of CSMT with viewFS, QE found that in 
> this configuration it is not possible to enter grunt shell because Pig makes 
> a call to getDefaultReplication() on the fs, which is ambiguous over viewFS 
> and causes core to throw a 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: "getDefaultReplication 
> on empty path is invalid".
> This in turn cause Pig to exit with an internal error as follows:
> 2012-07-06 22:20:25,657 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.1.0.1206081058 (r1348169) compiled Jun 08 2012, 17:58:42
> 2012-07-06 22:20:26,074 [main] WARN  org.apache.hadoop.conf.Configuration - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used
> 2012-07-06 22:20:26,076 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: viewfs:///
> 2012-07-06 22:20:26,080 [main] WARN  org.apache.hadoop.conf.Configuration - 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. getDefaultReplication on empty path is invalid
> 2012-07-06 22:20:26,522 [main] WARN  org.apache.pig.Main - There is no log 
> file to write to.
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication 
> on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:482)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:77)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:205)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:118)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:208)
> at org.apache.pig.PigServer.(PigServer.java:246)
> at org.apache.pig.PigServer.(PigServer.java:231)
> at org.apache.pig.tools.grunt.Grunt.(Grunt.java:47)
> at org.apache.pig.Main.run(Main.java:487)
> at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2632) Create a SchemaTuple which generates efficient Tuples via code gen

2012-07-11 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412475#comment-13412475
 ] 

Rohini Palaniswamy commented on PIG-2632:
-

TestSchemaTuple.java is broken for Hadoop 23/2.0.

{code}
reader.initialize(is, new TaskAttemptContext(conf, taskId)); 
{code}

should be
{code}
reader.initialize(is, HadoopShims.createTaskAttemptContext(conf, taskId)); 
{code}

> Create a SchemaTuple which generates efficient Tuples via code gen
> --
>
> Key: PIG-2632
> URL: https://issues.apache.org/jira/browse/PIG-2632
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2632-0.patch, PIG-2632-1.patch, PIG-2632-10.patch, 
> PIG-2632-10.patch, PIG-2632-3.patch, PIG-2632-4.patch, PIG-2632-5.patch, 
> PIG-2632-6.patch, PIG-2632-7.patch, PIG-2632-8.patch, PIG-2632-9.patch, 
> PIG-2632-9.patch, schematuple benchmarking.pdf, schematuple benchmarking.pptx
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing 
> the Schema on the frontend, we can code generate Tuples which can be used for 
> fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, 
> and it's ~15% smaller serialized (heavily heavily depends on the data, 
> though). Need to do get/set tests, but assuming that it's on par (or even 
> faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema 
> given to UDF's. The next step is to make a SchemaBag, where I think the 
> serialization savings will be really huge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2632) Create a SchemaTuple which generates efficient Tuples via code gen

2012-07-11 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412470#comment-13412470
 ] 

Jonathan Coveney commented on PIG-2632:
---

Daniel,

1) I agree that there are a lot of great places to use this. Next on my plate 
is using it with LoadFuncs and Foreaches, and then ideally Bag support (which I 
do not think would be difficult at all, just need some time). I hadn't thought 
about lazy tuples -- need to take a look at his code.
2) I submitted a patch fixing the MergeJoin errors, and have time to look at 
the rest. Do you know if any of the others were fixed by that fix? I hate the 
flakiness of the full test suite, hard to know what is and isn't a false 
positive!

> Create a SchemaTuple which generates efficient Tuples via code gen
> --
>
> Key: PIG-2632
> URL: https://issues.apache.org/jira/browse/PIG-2632
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.11
>
> Attachments: PIG-2632-0.patch, PIG-2632-1.patch, PIG-2632-10.patch, 
> PIG-2632-10.patch, PIG-2632-3.patch, PIG-2632-4.patch, PIG-2632-5.patch, 
> PIG-2632-6.patch, PIG-2632-7.patch, PIG-2632-8.patch, PIG-2632-9.patch, 
> PIG-2632-9.patch, schematuple benchmarking.pdf, schematuple benchmarking.pptx
>
>
> This work builds on Dmitriy's PrimitiveTuple work. The idea is that, knowing 
> the Schema on the frontend, we can code generate Tuples which can be used for 
> fun and profit. In rudimentary tests, the memory efficiency is 2-4x better, 
> and it's ~15% smaller serialized (heavily heavily depends on the data, 
> though). Need to do get/set tests, but assuming that it's on par (or even 
> faster) than Tuple, the memory gain is huge.
> Need to clean up the code and add tests.
> Right now, it generates a SchemaTuple for every inputSchema and outputSchema 
> given to UDF's. The next step is to make a SchemaBag, where I think the 
> serialization savings will be really huge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hash aggregation experience

2012-07-11 Thread Jie Li
Hi all,

Has anyone tried the hash aggregation feature in pig 0.10 and seen any
performance improvement? Recently I'm benchmarking HashAgg and the combiner
to see whether we should use HashAgg more aggresively, given that it has
lower overhead then the combiner and more flexibility that it can
auto-disable itself while the combiner can't.

Some of my benchmark results can be found in
https://cwiki.apache.org/confluence/display/PIG/Pig+Performance+Optimization#PigPerformanceOptimization-HashAggvs.Combiner.
Any comment is appreciated!

Jie


[jira] [Commented] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-07-11 Thread Haitao Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412459#comment-13412459
 ] 

Haitao Yao commented on PIG-2812:
-

Oh God, this is really a big change. It relates the Iterator of the data bags.
If we use 1 spill file, and every time we call the next(), we have to skip all 
the read bytes, and this will be a big performance penalty. I think we can 
consider change the iterator interface with a method free() added, so we can 
hold an InputStream and make sure the free method is called when the iterator 
finish its job.
Without the free method , we can not assure that somebody call break in an 
iterator's loop. This will cause InputStream leak.


> Spill InternalCachedBag into only 1 file
> 
>
> Key: PIG-2812
> URL: https://issues.apache.org/jira/browse/PIG-2812
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Reporter: Haitao Yao
> Fix For: 0.11
>
> Attachments: aa.jpg
>
>
> I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I 
> found out that the InternalCachedBag creates a seperate tmp file, and the tmp 
> files is deleted on exit. So the file delete hook caused the OOM. 
> Why not just hold the tmp file handle and spill only one tmp file?
> Too many tmp files may block the tasktracker start process, if the tmp files 
> are not cleaned on time and the tasktracker restarts at this specific time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1314) Add DateTime Support to Pig

2012-07-11 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412452#comment-13412452
 ] 

Thejas M Nair commented on PIG-1314:


Zhijie,
I have added comments on your latest patch in  
https://reviews.apache.org/r/5414/.
Yes, lets focus on test cases now, so that we can get an initial version 
committed. 

> Add DateTime Support to Pig
> ---
>
> Key: PIG-1314
> URL: https://issues.apache.org/jira/browse/PIG-1314
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.7.0
>Reporter: Russell Jurney
>Assignee: Zhijie Shen
>  Labels: gsoc2012
> Attachments: PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, 
> joda_vs_builtin.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Hadoop/Pig are primarily used to parse log data, and most logs have a 
> timestamp component.  Therefore Pig should support dates as a primitive.
> Can someone familiar with adding types to pig comment on how hard this is?  
> We're looking at doing this, rather than use UDFs.  Is this a patch that 
> would be accepted?
> This is a candidate project for Google summer of code 2012. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: Review for PIG-1314 - add datetime type in pig

2012-07-11 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5414/#review9044
---



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/DateTimeWritable.java


Please add @Override annotation for functions that are being overridden 



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java


The Long.valueOf is unnecessary. If the value can't be stored in an 
integer, it should result in a warning. See conversion from chararray to int. 



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java


It will be more readable to use {} for the if statement to make it more 
readable.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java


I think it makes sense to support conversion to all numeric types. That 
will be consistent with other conversions.




http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java


I think it is ok to allow conversions from double and float as well, we 
allow the conversions to long or int.




http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/AddDuration.java


outputSchema() implementation is unnecessary because the output type is 
simple and does not depend on input schema. 
Most or all datetime udfs won't need this.




http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/CurrentTime.java


We need to propagate the timezone from the client to backend - where the 
udfs get executed. 



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/DaysBetween.java


indentation cleanup can be done here



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/DaysBetween.java


this comment seems unrelated to this udf



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/DiffDate.java


javadoc is not correct for this udf



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/DiffDate.java


i am wondering if we should use Duration.toStandardDays() . But I think 
this approach might be more performant as it does not create an additional 
Duration  object.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/GetDay.java


The example will be more readable with just one date column. Ie ISOin = 
LOAD 'test.tsv' USING PigStorage('\t') AS (dt:datetime);





http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/SubtractDuration.java


It says AddDuration here



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/ToDate.java


need to document the different input arguments.



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/ToDate.java


we should avoid using exceptions as part of normal code path. It is better 
to separate this into two udfs classes, and let pig pick the right one based on 
input types (the purpose of getArgsToFuncMapping).



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/ToDate.java


does this argTOFuncMapping use work ?
I think you would need to add a different schema for each possible 
combination of input tuple type. 



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/ToDate.java


we need to use the default timezone if it is not specified. (this has to be 
shipped to backend).



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/ToUnixTime.java


unix time is seconds, not milliseconds



http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/Utf8StorageConverter.java


we should refactor this timezone extraction co

[jira] [Commented] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-07-11 Thread Haitao Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412421#comment-13412421
 ] 

Haitao Yao commented on PIG-2812:
-

Shall we change the Parent class : org.apache.pig.data.DefaultAbstractBag or 
just modify the InternalCachedBag ? 
I don't know why DefaultAbstractBag writes a tmp file for every tuple.
In our own code base, I just modified the InternalCachedBag, because I don't 
want know the how the subclasses of DefaultAbstractBag gonna behave if I modify 
the whole spill logic .
I want to contribute to this.
thanks.


> Spill InternalCachedBag into only 1 file
> 
>
> Key: PIG-2812
> URL: https://issues.apache.org/jira/browse/PIG-2812
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Reporter: Haitao Yao
> Fix For: 0.11
>
> Attachments: aa.jpg
>
>
> I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I 
> found out that the InternalCachedBag creates a seperate tmp file, and the tmp 
> files is deleted on exit. So the file delete hook caused the OOM. 
> Why not just hold the tmp file handle and spill only one tmp file?
> Too many tmp files may block the tasktracker start process, if the tmp files 
> are not cleaned on time and the tasktracker restarts at this specific time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2780) MapReduceLauncher should break early when one of the jobs throws an exception

2012-07-11 Thread Jie Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Li updated PIG-2780:


Attachment: PIG-2780.1.patch

Update the patch to log a early warning for the failure when stop_on_failure is 
not enabled.

> MapReduceLauncher should break early when one of the jobs throws an exception
> -
>
> Key: PIG-2780
> URL: https://issues.apache.org/jira/browse/PIG-2780
> Project: Pig
>  Issue Type: Bug
>Reporter: Feng Peng
>Assignee: Jie Li
> Fix For: 0.11
>
> Attachments: PIG-2780.0.patch, PIG-2780.1.patch
>
>
> Right now MapReduceLauncher caches the job exception in jobControlException 
> and only processes it when all the jobs are done:
> {noformat}
>   jcThread.setUncaughtExceptionHandler(jctExceptionHandler);
>   ...
>   jcThread.start();
>   // Now wait, till we are finished.
>   while(!jc.allFinished()){
>   ...
>   }
>   //check for the jobControlException first
>   //if the job controller fails before launching the jobs then there are
>   //no jobs to check for failure
>   if (jobControlException != null) {
> ...
>   }
> {noformat}
> There are two problems with this approach:
> 1. There is only one jobControlException variable. If two jobs are throwing 
> exceptions, the first one will be lost.
> 2. If there are multiple jobs, the exceptions will not be reported until 
> other jobs are finished, which is a waste of system resource.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2791) Can not enter grunt shell when using viewFS filesystem with CSMT and Federation

2012-07-11 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411993#comment-13411993
 ] 

Daryn Sharp commented on PIG-2791:
--

The problem is there is no single replication factor or block size with viewfs. 
 It's dependent on each mounted fs.  The {{getDefaultBlockSize()}} and 
{{getDefaultReplication()}} were deprecated in favor of methods that accept a 
{{Path}}.  This allows a fs like viewfs to resolve the mount point for a path 
and return the correct values.

The easy solution is don't call the methods at all and let the smaller 
signature create, etc methods implicitly get the replication and block size.  I 
understand that's not an option for most of pig's use cases.

Hbase encountered the same problem and fixed it with a little reflection magic 
on HBASE-6067.

> Can not enter grunt shell when using viewFS filesystem with CSMT and 
> Federation
> ---
>
> Key: PIG-2791
> URL: https://issues.apache.org/jira/browse/PIG-2791
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
> Environment: Pig QE
>Reporter: patrick white
>Priority: Blocker
> Attachments: PIG-2791-0.patch
>
>
> The Yahoo Pig QE team ran into a blocking issue when trying to test 
> Client-Side Mount Tables, on a Federated cluster with two NNs, this blocks 
> Pig Testing on Federation. 
> Federation relies strongly on the use of CSMT with viewFS, QE found that in 
> this configuration it is not possible to enter grunt shell because Pig makes 
> a call to getDefaultReplication() on the fs, which is ambiguous over viewFS 
> and causes core to throw a 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: "getDefaultReplication 
> on empty path is invalid".
> This in turn cause Pig to exit with an internal error as follows:
> 2012-07-06 22:20:25,657 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.1.0.1206081058 (r1348169) compiled Jun 08 2012, 17:58:42
> 2012-07-06 22:20:26,074 [main] WARN  org.apache.hadoop.conf.Configuration - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used
> 2012-07-06 22:20:26,076 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: viewfs:///
> 2012-07-06 22:20:26,080 [main] WARN  org.apache.hadoop.conf.Configuration - 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. getDefaultReplication on empty path is invalid
> 2012-07-06 22:20:26,522 [main] WARN  org.apache.pig.Main - There is no log 
> file to write to.
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication 
> on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:482)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:77)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:205)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:118)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:208)
> at org.apache.pig.PigServer.(PigServer.java:246)
> at org.apache.pig.PigServer.(PigServer.java:231)
> at org.apache.pig.tools.grunt.Grunt.(Grunt.java:47)
> at org.apache.pig.Main.run(Main.java:487)
> at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-07-11 Thread Dan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411968#comment-13411968
 ] 

Dan Li commented on PIG-2769:
-

Thanks for the clarification, and looking forward to the fix.

Dan

> a simple logic causes very long compiling time on pig 0.10.0
> 
>
> Key: PIG-2769
> URL: https://issues.apache.org/jira/browse/PIG-2769
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
>Reporter: Dan Li
> Fix For: 0.11
>
> Attachments: case1.tar
>
>
> We found the following simple logic will cause very long compiling time for 
> pig 0.10.0, while using pig 0.8.1, everything is fine.
> A = load 'A.txt' using PigStorage()  AS (m: int);
> B = FOREACH A {
> days_str = (chararray)
> (m == 1 ? 31: 
> (m == 2 ? 28: 
> (m == 3 ? 31: 
> (m == 4 ? 30: 
> (m == 5 ? 31: 
> (m == 6 ? 30: 
> (m == 7 ? 31: 
> (m == 8 ? 31: 
> (m == 9 ? 30: 
> (m == 10 ? 31: 
> (m == 11 ? 30:31)));
> GENERATE
>days_str as days_str;
> }   
> store B into 'B';
> and here's a simple input file example: A.txt
> 1
> 2
> 3
> The pig version we used in the test
> Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2791) Can not enter grunt shell when using viewFS filesystem with CSMT and Federation

2012-07-11 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411851#comment-13411851
 ] 

Rohini Palaniswamy commented on PIG-2791:
-

Daniel,
  This is the actual cause. 

Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
getDefaultBlockSize on empty path is invalid
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultBlockSize(ViewFileSystem.java:477)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:276)


> Can not enter grunt shell when using viewFS filesystem with CSMT and 
> Federation
> ---
>
> Key: PIG-2791
> URL: https://issues.apache.org/jira/browse/PIG-2791
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
> Environment: Pig QE
>Reporter: patrick white
>Priority: Blocker
> Attachments: PIG-2791-0.patch
>
>
> The Yahoo Pig QE team ran into a blocking issue when trying to test 
> Client-Side Mount Tables, on a Federated cluster with two NNs, this blocks 
> Pig Testing on Federation. 
> Federation relies strongly on the use of CSMT with viewFS, QE found that in 
> this configuration it is not possible to enter grunt shell because Pig makes 
> a call to getDefaultReplication() on the fs, which is ambiguous over viewFS 
> and causes core to throw a 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: "getDefaultReplication 
> on empty path is invalid".
> This in turn cause Pig to exit with an internal error as follows:
> 2012-07-06 22:20:25,657 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.10.1.0.1206081058 (r1348169) compiled Jun 08 2012, 17:58:42
> 2012-07-06 22:20:26,074 [main] WARN  org.apache.hadoop.conf.Configuration - 
> mapred.used.genericoptionsparser is deprecated. Instead, use 
> mapreduce.client.genericoptionsparser.used
> 2012-07-06 22:20:26,076 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: viewfs:///
> 2012-07-06 22:20:26,080 [main] WARN  org.apache.hadoop.conf.Configuration - 
> fs.default.name is deprecated. Instead, use fs.defaultFS
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - ERROR 2999: 
> Unexpected internal error. getDefaultReplication on empty path is invalid
> 2012-07-06 22:20:26,522 [main] WARN  org.apache.pig.Main - There is no log 
> file to write to.
> 2012-07-06 22:20:26,522 [main] ERROR org.apache.pig.Main - 
> org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication 
> on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:482)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:77)
> at 
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:205)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:118)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:208)
> at org.apache.pig.PigServer.(PigServer.java:246)
> at org.apache.pig.PigServer.(PigServer.java:231)
> at org.apache.pig.tools.grunt.Grunt.(Grunt.java:47)
> at org.apache.pig.Main.run(Main.java:487)
> at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2765) Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator

2012-07-11 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated PIG-2765:


Attachment: PIG-2765.2.patch

> Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator
> ---
>
> Key: PIG-2765
> URL: https://issues.apache.org/jira/browse/PIG-2765
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: PIG-2765.1.patch, PIG-2765.2.git.patch, PIG-2765.2.patch
>
>
> Implement RollupDimensions UDF which performs aggregation from most detailed 
> level of dimensions to the most general level (grand total) in hierarchical 
> order. Provide support for ROLLUP clause in CUBE operator. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2765) Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator

2012-07-11 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411842#comment-13411842
 ] 

Prasanth J commented on PIG-2765:
-

Added svn patch with changes based on Dmitriy's code review comments.

> Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator
> ---
>
> Key: PIG-2765
> URL: https://issues.apache.org/jira/browse/PIG-2765
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: PIG-2765.1.patch, PIG-2765.2.git.patch, PIG-2765.2.patch
>
>
> Implement RollupDimensions UDF which performs aggregation from most detailed 
> level of dimensions to the most general level (grand total) in hierarchical 
> order. Provide support for ROLLUP clause in CUBE operator. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2765) Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator

2012-07-11 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411840#comment-13411840
 ] 

Prasanth J commented on PIG-2765:
-

Updated the old review with a new patch. Please use the old review 
https://reviews.apache.org/r/5521/ to look at the changed bits based on your 
review comment. Ignore my previous comment too :)

> Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator
> ---
>
> Key: PIG-2765
> URL: https://issues.apache.org/jira/browse/PIG-2765
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: PIG-2765.1.patch, PIG-2765.2.git.patch
>
>
> Implement RollupDimensions UDF which performs aggregation from most detailed 
> level of dimensions to the most general level (grand total) in hierarchical 
> order. Provide support for ROLLUP clause in CUBE operator. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Review Request: PIG-2765: Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator

2012-07-11 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5521/
---

(Updated July 11, 2012, 6:42 p.m.)


Review request for pig and Dmitriy Ryaboy.


Changes
---

Dmitriy's review comment changes added.


Description
---

This is a review board request for 
https://issues.apache.org/jira/browse/PIG-2765


This addresses bug PIG-2765.
https://issues.apache.org/jira/browse/PIG-2765


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/CubeDimensions.java
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/RollupDimensions.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCube.java
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLexer.pig
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestLogicalPlanGenerator.java
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestParser.pig
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryLexer.java
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/parser/TestQueryParser.java
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestCubeOperator.java
 1360329 
  
http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestRollupDimensions.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/5521/diff/


Testing
---

Unit tests: All passed

Pre-commit tests: All passed
ant clean test-commit


Thanks,

Prasanth_J



[jira] [Updated] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-07-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2812:


Fix Version/s: 0.11

> Spill InternalCachedBag into only 1 file
> 
>
> Key: PIG-2812
> URL: https://issues.apache.org/jira/browse/PIG-2812
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Reporter: Haitao Yao
> Fix For: 0.11
>
> Attachments: aa.jpg
>
>
> I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I 
> found out that the InternalCachedBag creates a seperate tmp file, and the tmp 
> files is deleted on exit. So the file delete hook caused the OOM. 
> Why not just hold the tmp file handle and spill only one tmp file?
> Too many tmp files may block the tasktracker start process, if the tmp files 
> are not cleaned on time and the tasktracker restarts at this specific time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-07-11 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411811#comment-13411811
 ] 

Daniel Dai commented on PIG-2769:
-

No, it is not fixed. 5 min to compile a plan is not acceptable. I am not saying 
a particular version works, just to post what I observe. Sorry for the 
confusion. We need to fix it.

> a simple logic causes very long compiling time on pig 0.10.0
> 
>
> Key: PIG-2769
> URL: https://issues.apache.org/jira/browse/PIG-2769
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
>Reporter: Dan Li
> Fix For: 0.11
>
> Attachments: case1.tar
>
>
> We found the following simple logic will cause very long compiling time for 
> pig 0.10.0, while using pig 0.8.1, everything is fine.
> A = load 'A.txt' using PigStorage()  AS (m: int);
> B = FOREACH A {
> days_str = (chararray)
> (m == 1 ? 31: 
> (m == 2 ? 28: 
> (m == 3 ? 31: 
> (m == 4 ? 30: 
> (m == 5 ? 31: 
> (m == 6 ? 30: 
> (m == 7 ? 31: 
> (m == 8 ? 31: 
> (m == 9 ? 30: 
> (m == 10 ? 31: 
> (m == 11 ? 30:31)));
> GENERATE
>days_str as days_str;
> }   
> store B into 'B';
> and here's a simple input file example: A.txt
> 1
> 2
> 3
> The pig version we used in the test
> Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2012-07-11 Thread Dan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411717#comment-13411717
 ] 

Dan Li commented on PIG-2769:
-

Hi, Daniel,

I'm little bit confused. is this problem fixed or not?

Thanks.
Dan

> a simple logic causes very long compiling time on pig 0.10.0
> 
>
> Key: PIG-2769
> URL: https://issues.apache.org/jira/browse/PIG-2769
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.10.0
> Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
>Reporter: Dan Li
> Fix For: 0.11
>
> Attachments: case1.tar
>
>
> We found the following simple logic will cause very long compiling time for 
> pig 0.10.0, while using pig 0.8.1, everything is fine.
> A = load 'A.txt' using PigStorage()  AS (m: int);
> B = FOREACH A {
> days_str = (chararray)
> (m == 1 ? 31: 
> (m == 2 ? 28: 
> (m == 3 ? 31: 
> (m == 4 ? 30: 
> (m == 5 ? 31: 
> (m == 6 ? 30: 
> (m == 7 ? 31: 
> (m == 8 ? 31: 
> (m == 9 ? 30: 
> (m == 10 ? 31: 
> (m == 11 ? 30:31)));
> GENERATE
>days_str as days_str;
> }   
> store B into 'B';
> and here's a simple input file example: A.txt
> 1
> 2
> 3
> The pig version we used in the test
> Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-07-11 Thread Haitao Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haitao Yao updated PIG-2812:


Attachment: aa.jpg

the heap dump analyze of OOMed reducer.


> Spill InternalCachedBag into only 1 file
> 
>
> Key: PIG-2812
> URL: https://issues.apache.org/jira/browse/PIG-2812
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Reporter: Haitao Yao
> Attachments: aa.jpg
>
>
> I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I 
> found out that the InternalCachedBag creates a seperate tmp file, and the tmp 
> files is deleted on exit. So the file delete hook caused the OOM. 
> Why not just hold the tmp file handle and spill only one tmp file?
> Too many tmp files may block the tasktracker start process, if the tmp files 
> are not cleaned on time and the tasktracker restarts at this specific time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-07-11 Thread Haitao Yao (JIRA)
Haitao Yao created PIG-2812:
---

 Summary: Spill InternalCachedBag into only 1 file
 Key: PIG-2812
 URL: https://issues.apache.org/jira/browse/PIG-2812
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Haitao Yao


I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I found 
out that the InternalCachedBag creates a seperate tmp file, and the tmp files 
is deleted on exit. So the file delete hook caused the OOM. 
Why not just hold the tmp file handle and spill only one tmp file?
Too many tmp files may block the tasktracker start process, if the tmp files 
are not cleaned on time and the tasktracker restarts at this specific time.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2811) Updating .eclipse.templates/.classpath with the Newest Jython Version

2012-07-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated PIG-2811:
-

Status: Patch Available  (was: Open)

> Updating .eclipse.templates/.classpath with the Newest Jython Version
> -
>
> Key: PIG-2811
> URL: https://issues.apache.org/jira/browse/PIG-2811
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Trivial
> Fix For: 0.11
>
> Attachments: PIG-2811.patch
>
>
> Jython library version has been upgraded to 2.5.2 by the PIG-2665 patch, but 
> the related modification is not made in the Eclipse template file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2811) Updating .eclipse.templates/.classpath with the Newest Jython Version

2012-07-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated PIG-2811:
-

Attachment: PIG-2811.patch

> Updating .eclipse.templates/.classpath with the Newest Jython Version
> -
>
> Key: PIG-2811
> URL: https://issues.apache.org/jira/browse/PIG-2811
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Trivial
> Fix For: 0.11
>
> Attachments: PIG-2811.patch
>
>
> Jython library version has been upgraded to 2.5.2 by the PIG-2665 patch, but 
> the related modification is not made in the Eclipse template file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2811) Updating .eclipse.templates/.classpath with the Newest Jython Version

2012-07-11 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated PIG-2811:
-

Description: Jython library version has been upgraded to 2.5.2 by the 
PIG-2665 patch, but the related modification is not made in the Eclipse 
template file.  (was: Jython library version has been upgraded to 2.5.2 by the 
PIG-2696 patch, but the related modification is not made in the Eclipse 
template file.)

> Updating .eclipse.templates/.classpath with the Newest Jython Version
> -
>
> Key: PIG-2811
> URL: https://issues.apache.org/jira/browse/PIG-2811
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Trivial
> Fix For: 0.11
>
>
> Jython library version has been upgraded to 2.5.2 by the PIG-2665 patch, but 
> the related modification is not made in the Eclipse template file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (PIG-2811) Updating .eclipse.templates/.classpath with the Newest Jython Version

2012-07-11 Thread Zhijie Shen (JIRA)
Zhijie Shen created PIG-2811:


 Summary: Updating .eclipse.templates/.classpath with the Newest 
Jython Version
 Key: PIG-2811
 URL: https://issues.apache.org/jira/browse/PIG-2811
 Project: Pig
  Issue Type: Bug
  Components: tools
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Trivial
 Fix For: 0.11


Jython library version has been upgraded to 2.5.2 by the PIG-2696 patch, but 
the related modification is not made in the Eclipse template file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira