date:20130318

[jira] [Updated] (PIG-3170) Pig keeps static references to Hadoop's Context after end of task

2013-03-18 Thread Johnny Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated PIG-3170:
--

Attachment: PIG-3170.patch.txt

the old patch doesn't clean on trunk already. I just uploaded the new patch. 
Will try unit tests on top of it.

> Pig keeps static references to Hadoop's Context after end of task
> -
>
> Key: PIG-3170
> URL: https://issues.apache.org/jira/browse/PIG-3170
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Clément Stenac
>Priority: Minor
> Attachments: PIG-3170.patch.txt, pig-staticreferences-to-context.diff
>
>
> Through the PigStatusReporter, and the ProgressableReporter, when a Pig MR 
> task is done, static references are kept to Hadoop's Context object.
> Additionally, the PigCombiner also keeps a static reference, apparently 
> without using it.
> When the JVM is reused between MR tasks, it can cause large memory 
> overconsumption, with a peak during the creation of the next task, because 
> while MR is creating the next task (in MapTask. for example), we have 
> both contexts (with  their associated buffers) allocated at once.
> This problem is especially important when using a Combiner, because the 
> ReduceContext of a Combiner contains references to large sort buffers.
> The specifics of our case were:
> * 20 GB input data, divided in 85 map tasks
> * Very simple Pig script: LOAD A, FILTER A, GROUP A, FOREACH group generate 
> MAX(field), STORE  
> * MapR distribution, which automatically computes Xmx for mappers at 800MB
> * At the end of the first task, the ReduceContext contains more than 400MB of 
> byte[]
> * Systematic OOM in MapTask. on subsequent VM reuse
> * At least -Xmx1200m was required to get the job to complete
> * With attached patch, -Xmx600m is enough
> While a workaround by increasing Xmx is possible, I think the large 
> overconsumption and the complexity of debugging the issue (because the OOM 
> actually happens at the very beginning of the task, before the first byte of 
> data has been processed) warrants fixing it.
> The attached patch makes sure that PigStatusReporter and ProgressableReporter 
> drop their reference to the Context in the cleanup phases of the task.
> No new test is included because I don't really think it's possible to write a 
> unit test, the issue being not "binary"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3208) [zebra] TFile should not set io.compression.codec.lzo.buffersize

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606028#comment-13606028
 ] 

Daniel Dai commented on PIG-3208:
-

Thanks Xuefu chime in. I am happy to check in any Zebra patches you are Ok with.

> [zebra] TFile should not set io.compression.codec.lzo.buffersize
> 
>
> Key: PIG-3208
> URL: https://issues.apache.org/jira/browse/PIG-3208
> Project: Pig
>  Issue Type: Bug
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Attachments: PIG-3208.patch
>
>
> In contrib/zebra/src/java/org/apache/hadoop/zebra/tfile/Compression.java, the 
> following occurs:
> {code}
> conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
> {code}
> This can cause the LZO decompressor, if called within the context of reading 
> TFiles, to return with an error code when trying to uncompress LZO-compressed 
> data, if the data's compressed size is too large to fit in 64 * 1024 bytes.
> For example, the Hadoop-LZO code uses a different default value (256 * 1024):
> https://github.com/twitter/hadoop-lzo/blob/master/src/java/com/hadoop/compression/lzo/LzoCodec.java#L185
> This can lead to a case where, if data is compressed with a cluster where the 
> default {{io.compression.codec.lzo.buffersize}} = 256*1024 is used, then code 
> that tries to read this data by using Pig's zebra, the Mapper will exit with 
> code 134 because the LZO compressor returns a -4 (which encodes the LZO C 
> library error LZO_E_INPUT_OVERRUN) when trying to uncompress the data. The 
> stack trace of such a case is shown below:
> {code}
> 2013-02-17 14:47:50,709 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 3
> 2013-02-17 14:47:50,857 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,866 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 4
> 2013-02-17 14:47:50,887 INFO org.apache.hadoop.mapred.Merger: Merging 5 
> sorted segments
> 2013-02-17 14:47:50,890 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@66a23610
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 262144
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> read: 245688 bytes from decompressor.
> 2013-02-17 14:47:50,891 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@43684706
> 2013-02-17 14:47:50,892 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 65536
> 2013-02-17 14:47:50,895 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-02-17 14:47:50,897 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.InternalError: lzo1x_decompress returned: -4
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native 
> Method)
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:307)
> at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
> at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:341)
> at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:371)
> at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:355)
> at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:387)
> at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1548)
> at 
> o

[jira] [Commented] (PIG-3208) [zebra] TFile should not set io.compression.codec.lzo.buffersize

2013-03-18 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606023#comment-13606023
 ] 

Xuefu Zhang commented on PIG-3208:
--

Eugene,

If we remove the two lines, will Zebra take whatever the default is from the 
cluster settings? What happens if it's not set in the cluster?

Also, for your changes, we'll need to run the tests to watch for regressions.


> [zebra] TFile should not set io.compression.codec.lzo.buffersize
> 
>
> Key: PIG-3208
> URL: https://issues.apache.org/jira/browse/PIG-3208
> Project: Pig
>  Issue Type: Bug
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Attachments: PIG-3208.patch
>
>
> In contrib/zebra/src/java/org/apache/hadoop/zebra/tfile/Compression.java, the 
> following occurs:
> {code}
> conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
> {code}
> This can cause the LZO decompressor, if called within the context of reading 
> TFiles, to return with an error code when trying to uncompress LZO-compressed 
> data, if the data's compressed size is too large to fit in 64 * 1024 bytes.
> For example, the Hadoop-LZO code uses a different default value (256 * 1024):
> https://github.com/twitter/hadoop-lzo/blob/master/src/java/com/hadoop/compression/lzo/LzoCodec.java#L185
> This can lead to a case where, if data is compressed with a cluster where the 
> default {{io.compression.codec.lzo.buffersize}} = 256*1024 is used, then code 
> that tries to read this data by using Pig's zebra, the Mapper will exit with 
> code 134 because the LZO compressor returns a -4 (which encodes the LZO C 
> library error LZO_E_INPUT_OVERRUN) when trying to uncompress the data. The 
> stack trace of such a case is shown below:
> {code}
> 2013-02-17 14:47:50,709 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 3
> 2013-02-17 14:47:50,857 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,866 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 4
> 2013-02-17 14:47:50,887 INFO org.apache.hadoop.mapred.Merger: Merging 5 
> sorted segments
> 2013-02-17 14:47:50,890 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@66a23610
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 262144
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> read: 245688 bytes from decompressor.
> 2013-02-17 14:47:50,891 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@43684706
> 2013-02-17 14:47:50,892 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 65536
> 2013-02-17 14:47:50,895 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-02-17 14:47:50,897 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.InternalError: lzo1x_decompress returned: -4
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native 
> Method)
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:307)
> at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
> at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:341)
> at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:371)
> at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:355)
> at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:387)
> at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> at org.apache.hado

[jira] [Commented] (PIG-3208) [zebra] TFile should not set io.compression.codec.lzo.buffersize

2013-03-18 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606016#comment-13606016
 ] 

Xuefu Zhang commented on PIG-3208:
--

I guess Daniel meant to say that Zebra isn't in active development, which is 
true. I think if people uses it problem will be discovered. As long as any fix 
contributed meets the quality requirement, the whole community benefits from 
it. 

> [zebra] TFile should not set io.compression.codec.lzo.buffersize
> 
>
> Key: PIG-3208
> URL: https://issues.apache.org/jira/browse/PIG-3208
> Project: Pig
>  Issue Type: Bug
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Attachments: PIG-3208.patch
>
>
> In contrib/zebra/src/java/org/apache/hadoop/zebra/tfile/Compression.java, the 
> following occurs:
> {code}
> conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
> {code}
> This can cause the LZO decompressor, if called within the context of reading 
> TFiles, to return with an error code when trying to uncompress LZO-compressed 
> data, if the data's compressed size is too large to fit in 64 * 1024 bytes.
> For example, the Hadoop-LZO code uses a different default value (256 * 1024):
> https://github.com/twitter/hadoop-lzo/blob/master/src/java/com/hadoop/compression/lzo/LzoCodec.java#L185
> This can lead to a case where, if data is compressed with a cluster where the 
> default {{io.compression.codec.lzo.buffersize}} = 256*1024 is used, then code 
> that tries to read this data by using Pig's zebra, the Mapper will exit with 
> code 134 because the LZO compressor returns a -4 (which encodes the LZO C 
> library error LZO_E_INPUT_OVERRUN) when trying to uncompress the data. The 
> stack trace of such a case is shown below:
> {code}
> 2013-02-17 14:47:50,709 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 3
> 2013-02-17 14:47:50,857 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,866 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 4
> 2013-02-17 14:47:50,887 INFO org.apache.hadoop.mapred.Merger: Merging 5 
> sorted segments
> 2013-02-17 14:47:50,890 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@66a23610
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 262144
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> read: 245688 bytes from decompressor.
> 2013-02-17 14:47:50,891 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@43684706
> 2013-02-17 14:47:50,892 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 65536
> 2013-02-17 14:47:50,895 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-02-17 14:47:50,897 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.InternalError: lzo1x_decompress returned: -4
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native 
> Method)
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:307)
> at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
> at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:341)
> at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:371)
> at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:355)
> at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:387)
> at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> at org.

Can we commit PIG-3015 (Rewrite of AvroStorage) to trunk?

2013-03-18 Thread Cheolsoo Park

Hello,

Thanks to Joseph Adler's contribution, we have a new AvroStorage ready.

Although there are additional requests that we would like to address, I
think we can implement them incrementally after we commit the current
patches. As of now,
- The core features are fully implemented.
- All the unit tests are passing with both Hadoop 1.x and 2.x.
- The documentation is added:
http://people.apache.org/~cheolsoo/site/func.html#AvroStorage

Since I modified the patches several times, I cannot give +1 by myself. So
I am wondering if another committer can review it. The latest patches are
as follows:
- PIG-3015-11.patch
- PIG-3015-doc-2.patch

In fact, Dmitriy asked in the jira whether or not we should add AvroStorage
to the core Pig. Although I prefer to add it to the core Pig, I am open to
discussion.

Please let me know what you think.

Thanks,
Cheolsoo

[jira] [Updated] (PIG-3110) pig corrupts chararrays with trailing whitespace when converting them to long

2013-03-18 Thread Prashant Kommireddi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3110:
-

Fix Version/s: 0.12
   Status: Patch Available  (was: Open)

> pig corrupts chararrays with trailing whitespace when converting them to long
> -
>
> Key: PIG-3110
> URL: https://issues.apache.org/jira/browse/PIG-3110
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.10.0
>Reporter: Ido Hadanny
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: PIG-3110.patch
>
>
> when trying to convert the following string into long, pig corrupts it. data:
> 1703598819951657279 ,44081037
> data1 = load 'data' using CSVLoader as (a: chararray ,b: int);
> data2 = foreach data1 generate (long)a as a;
> dump data2;
> (1703598819951657216)<--- last 2 digits are corrupted
> data2 = foreach data1 generate (long)TRIM(a) as a;
> dump data2;
> (1703598819951657279)<--- correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3110) pig corrupts chararrays with trailing whitespace when converting them to long

2013-03-18 Thread Prashant Kommireddi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3110:
-

Attachment: PIG-3110.patch

Patch contains changes to Utf8StorageConverter and TestConversions. In addition 
to making the above discussed changes I have also added an additional check

{code}
if(b == null || b.length == 0) {
return null;
}
{code}
We don't need to parse further if the input byte array is empty, thereby 
avoiding expensive valueOf(String s) calls.

Also, this could further be optimized if the only reason now for falling back 
on Double.valueOf() is to handle floating points. The current process for 
bytesToLong and bytesToInteger in case of floating point numbers is:
1. Integer/Long.valueOf(String)
2. If 1 results in null, call Double.valueOf
3. Convert result of 2 back to Integer/Long.

Input bytearray can be determined to be a floating point thereby avoiding call 
1.

Last thing, the above process takes place regardless of whether input byte 
array is numeric or not. This is unnecessary in case of strings like 
"1234abcd". 

If all agree, we should open another JIRA and optimize these methods further.

> pig corrupts chararrays with trailing whitespace when converting them to long
> -
>
> Key: PIG-3110
> URL: https://issues.apache.org/jira/browse/PIG-3110
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.10.0
>Reporter: Ido Hadanny
> Attachments: PIG-3110.patch
>
>
> when trying to convert the following string into long, pig corrupts it. data:
> 1703598819951657279 ,44081037
> data1 = load 'data' using CSVLoader as (a: chararray ,b: int);
> data2 = foreach data1 generate (long)a as a;
> dump data2;
> (1703598819951657216)<--- last 2 digits are corrupted
> data2 = foreach data1 generate (long)TRIM(a) as a;
> dump data2;
> (1703598819951657279)<--- correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-3110) pig corrupts chararrays with trailing whitespace when converting them to long

2013-03-18 Thread Prashant Kommireddi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi reassigned PIG-3110:


Assignee: Prashant Kommireddi

> pig corrupts chararrays with trailing whitespace when converting them to long
> -
>
> Key: PIG-3110
> URL: https://issues.apache.org/jira/browse/PIG-3110
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.10.0
>Reporter: Ido Hadanny
>Assignee: Prashant Kommireddi
> Attachments: PIG-3110.patch
>
>
> when trying to convert the following string into long, pig corrupts it. data:
> 1703598819951657279 ,44081037
> data1 = load 'data' using CSVLoader as (a: chararray ,b: int);
> data2 = foreach data1 generate (long)a as a;
> dump data2;
> (1703598819951657216)<--- last 2 digits are corrupted
> data2 = foreach data1 generate (long)TRIM(a) as a;
> dump data2;
> (1703598819951657279)<--- correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3252) AvroStorage gives wrong schema for schemas with named records

2013-03-18 Thread Mark Wagner (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated PIG-3252:
-

Fix Version/s: 0.11.1
   0.12
   Status: Patch Available  (was: Open)

> AvroStorage gives wrong schema for schemas with named records
> -
>
> Key: PIG-3252
> URL: https://issues.apache.org/jira/browse/PIG-3252
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Fix For: 0.12, 0.11.1
>
> Attachments: PIG-3252.1.patch
>
>
> Given the Avro schema:
> {code}
> {"type":"record",
>  "name":"toplevel",
>  "fields":[{"name":"a","
> "type":{"type":"record",
> "name":"x",
> "fields":[{"name":"key","type":"int"},
>   {"name":"value","type":"string"}]}},
>{"name":"b","type":"x"}]}
> {code}
> we should get back the Pig schema
> {code} {a: (key: int,value: string),b:(key: int,value: string)} {code}
> but instead it is
> {code} {a: (key: int,value: string),b: bytearray} {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3252) AvroStorage gives wrong schema for schemas with named records

2013-03-18 Thread Mark Wagner (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated PIG-3252:
-

Attachment: PIG-3252.1.patch

This was introduced with the added support for recursive schemas in [PIG-2875]. 
I've attached a patch to fix.

> AvroStorage gives wrong schema for schemas with named records
> -
>
> Key: PIG-3252
> URL: https://issues.apache.org/jira/browse/PIG-3252
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Mark Wagner
>Assignee: Mark Wagner
> Attachments: PIG-3252.1.patch
>
>
> Given the Avro schema:
> {code}
> {"type":"record",
>  "name":"toplevel",
>  "fields":[{"name":"a","
> "type":{"type":"record",
> "name":"x",
> "fields":[{"name":"key","type":"int"},
>   {"name":"value","type":"string"}]}},
>{"name":"b","type":"x"}]}
> {code}
> we should get back the Pig schema
> {code} {a: (key: int,value: string),b:(key: int,value: string)} {code}
> but instead it is
> {code} {a: (key: int,value: string),b: bytearray} {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-3252) AvroStorage gives wrong schema for schemas with named records

2013-03-18 Thread Mark Wagner (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner reassigned PIG-3252:


Assignee: Mark Wagner

> AvroStorage gives wrong schema for schemas with named records
> -
>
> Key: PIG-3252
> URL: https://issues.apache.org/jira/browse/PIG-3252
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>
> Given the Avro schema:
> {code}
> {"type":"record",
>  "name":"toplevel",
>  "fields":[{"name":"a","
> "type":{"type":"record",
> "name":"x",
> "fields":[{"name":"key","type":"int"},
>   {"name":"value","type":"string"}]}},
>{"name":"b","type":"x"}]}
> {code}
> we should get back the Pig schema
> {code} {a: (key: int,value: string),b:(key: int,value: string)} {code}
> but instead it is
> {code} {a: (key: int,value: string),b: bytearray} {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (PIG-3252) AvroStorage gives wrong schema for schemas with named records

2013-03-18 Thread Mark Wagner (JIRA)

Mark Wagner created PIG-3252:


 Summary: AvroStorage gives wrong schema for schemas with named 
records
 Key: PIG-3252
 URL: https://issues.apache.org/jira/browse/PIG-3252
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11
Reporter: Mark Wagner


Given the Avro schema:

{code}
{"type":"record",
 "name":"toplevel",
 "fields":[{"name":"a","
"type":{"type":"record",
"name":"x",
"fields":[{"name":"key","type":"int"},
  {"name":"value","type":"string"}]}},
   {"name":"b","type":"x"}]}
{code}

we should get back the Pig schema

{code} {a: (key: int,value: string),b:(key: int,value: string)} {code}
but instead it is

{code} {a: (key: int,value: string),b: bytearray} {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Subscription: PIG patch available

2013-03-18 Thread jira

Issue Subscription
Filter: PIG patch available (31 issues)

Subscriber: pigdaily

Key Summary
PIG-3249Pig startup script prints out a wrong version of hadoop when using 
fat jar
https://issues.apache.org/jira/browse/PIG-3249
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3238Pig current releases lack a UDF Stuff(). This UDF deletes a 
specified length of characters and inserts another set of characters at a 
specified starting point.
https://issues.apache.org/jira/browse/PIG-3238
PIG-3237Pig current releases lack a UDF MakeSet(). This UDF returns a set 
value (a string containing substrings separated by "," characters) consisting 
of the strings that have the corresponding bit in the first argument
https://issues.apache.org/jira/browse/PIG-3237
PIG-3215[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated 
Values) files
https://issues.apache.org/jira/browse/PIG-3215
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3208[zebra] TFile should not set io.compression.codec.lzo.buffersize
https://issues.apache.org/jira/browse/PIG-3208
PIG-3198Let users use any function from PigType -> PigType as if it were 
builtlin
https://issues.apache.org/jira/browse/PIG-3198
PIG-3190Add LuceneTokenizer and SnowballTokenizer to Pig - useful text 
tokenization
https://issues.apache.org/jira/browse/PIG-3190
PIG-3183rm or rmf commands should respect globbing/regex of path
https://issues.apache.org/jira/browse/PIG-3183
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3164Pig current releases lack a UDF endsWith.This UDF tests if a given 
string ends with the specified suffix.
https://issues.apache.org/jira/browse/PIG-3164
PIG-3141Giving CSVExcelStorage an option to handle header rows
https://issues.apache.org/jira/browse/PIG-3141
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3122Operators should not implicitly become reserved keywords
https://issues.apache.org/jira/browse/PIG-3122
PIG-3114Duplicated macro name error when using pigunit
https://issues.apache.org/jira/browse/PIG-3114
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2643Use bytecode generation to make a performance replacement for 
InvokeForLong, InvokeForString, etc
https://issues.apache.org/jira/browse/PIG-2643
PIG-2641Create toJSON function for all complex types: tuples, bags and maps
https://issues.apache.org/jira/browse/PIG-2641
PIG-2591Unit tests should not write to /tmp but respect java.io.tmpdir
https://issues.apache.org/jira/browse/PIG-2591
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384

[jira] [Commented] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605874#comment-13605874
 ] 

Daniel Dai commented on PIG-3251:
-

Looks good. Seems the reason why we use ByteArrayOutputStream before is to get 
auto-expand byte array for free, which Pig can manage by itself to reduce the 
memory foot print. Let me know if you find any problem in your testing.

> Bzip2TextInputFormat requires double the memory of maximum record size
> --
>
> Key: PIG-3251
> URL: https://issues.apache.org/jira/browse/PIG-3251
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-3251-trunk-v01.patch
>
>
> While looking at user's OOM heap dump, noticed that pig's 
> Bzip2TextInputFormat consumes memory at both
> Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
> and actual Text that is returned as line.
> For example, when having one record with 160MBytes, buffer was 268MBytes and 
> Text was 160MBytes.  
> We can probably eliminate one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2388) Make shim for Hadoop 0.20 and 0.23 support dynamic

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605870#comment-13605870
 ] 

Daniel Dai commented on PIG-2388:
-

[~dvryaboy] Hive does not has illustrate. It is the illustrate implementation 
(IllustratorContextImpl) which makes dynamic shims layer hard for Pig.

> Make shim for Hadoop 0.20 and 0.23 support dynamic
> --
>
> Key: PIG-2388
> URL: https://issues.apache.org/jira/browse/PIG-2388
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thomas Weise
> Fix For: 0.9.2, 0.10.0
>
> Attachments: PIG-2388_branch-0.9.patch
>
>
> We need a single Pig installation that works with both Hadoop versions. The 
> current shim implementation assumes different builds for each version. We can 
> solve this statically through internal build/installation system or by making 
> the shim dynamic so that pig.jar will work on both version with runtime 
> detection. Attached patch is to convert the static shims into a shim 
> interface with 2 implementations, each of which will be compiled against the 
> respective Hadoop version and included into single pig.jar (similar to what 
> Hive does).
> The default build behavior remains unchanged, only the shim for 
> ${hadoopversion} will be compiled. Both shims can be built via:  ant 
> -Dbuild-all-shims=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2388) Make shim for Hadoop 0.20 and 0.23 support dynamic

2013-03-18 Thread Dmitriy V. Ryaboy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605849#comment-13605849
 ] 

Dmitriy V. Ryaboy commented on PIG-2388:


Hive does this, and back in the day there was a patch that did this for Pig and 
hadoop 18 vs hadoop 20.
Should be doable, though it'll take work..

> Make shim for Hadoop 0.20 and 0.23 support dynamic
> --
>
> Key: PIG-2388
> URL: https://issues.apache.org/jira/browse/PIG-2388
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Thomas Weise
> Fix For: 0.9.2, 0.10.0
>
> Attachments: PIG-2388_branch-0.9.patch
>
>
> We need a single Pig installation that works with both Hadoop versions. The 
> current shim implementation assumes different builds for each version. We can 
> solve this statically through internal build/installation system or by making 
> the shim dynamic so that pig.jar will work on both version with runtime 
> detection. Attached patch is to convert the static shims into a shim 
> interface with 2 implementations, each of which will be compiled against the 
> respective Hadoop version and included into single pig.jar (similar to what 
> Hive does).
> The default build behavior remains unchanged, only the shim for 
> ${hadoopversion} will be compiled. Both shims can be built via:  ant 
> -Dbuild-all-shims=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2451) Pig should not mandate hadoop-site.xml or core-site.xml to be on the classpath

2013-03-18 Thread Prashant Kommireddi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605841#comment-13605841
 ] 

Prashant Kommireddi commented on PIG-2451:
--

Hi [~julienledem], does PIG-3135 address this?

> Pig should not mandate hadoop-site.xml or core-site.xml to be on the classpath
> --
>
> Key: PIG-2451
> URL: https://issues.apache.org/jira/browse/PIG-2451
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>
> This prevents initializing Properties to be passed to PigServer in a 
> different way.
> In particular the "Hadoop local" mode can not be used easily in tests.
> See: 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(Properties
>  properties)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3208) [zebra] TFile should not set io.compression.codec.lzo.buffersize

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605835#comment-13605835
 ] 

Daniel Dai commented on PIG-3208:
-

Ok, I am not objecting to that. My concern is if a fix introduces other issues 
unintentionally, nobody would look at it. But I am fine if that happens, we 
simply revert the patch.

> [zebra] TFile should not set io.compression.codec.lzo.buffersize
> 
>
> Key: PIG-3208
> URL: https://issues.apache.org/jira/browse/PIG-3208
> Project: Pig
>  Issue Type: Bug
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Attachments: PIG-3208.patch
>
>
> In contrib/zebra/src/java/org/apache/hadoop/zebra/tfile/Compression.java, the 
> following occurs:
> {code}
> conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
> {code}
> This can cause the LZO decompressor, if called within the context of reading 
> TFiles, to return with an error code when trying to uncompress LZO-compressed 
> data, if the data's compressed size is too large to fit in 64 * 1024 bytes.
> For example, the Hadoop-LZO code uses a different default value (256 * 1024):
> https://github.com/twitter/hadoop-lzo/blob/master/src/java/com/hadoop/compression/lzo/LzoCodec.java#L185
> This can lead to a case where, if data is compressed with a cluster where the 
> default {{io.compression.codec.lzo.buffersize}} = 256*1024 is used, then code 
> that tries to read this data by using Pig's zebra, the Mapper will exit with 
> code 134 because the LZO compressor returns a -4 (which encodes the LZO C 
> library error LZO_E_INPUT_OVERRUN) when trying to uncompress the data. The 
> stack trace of such a case is shown below:
> {code}
> 2013-02-17 14:47:50,709 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 3
> 2013-02-17 14:47:50,857 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,866 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 4
> 2013-02-17 14:47:50,887 INFO org.apache.hadoop.mapred.Merger: Merging 5 
> sorted segments
> 2013-02-17 14:47:50,890 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@66a23610
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 262144
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> read: 245688 bytes from decompressor.
> 2013-02-17 14:47:50,891 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@43684706
> 2013-02-17 14:47:50,892 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 65536
> 2013-02-17 14:47:50,895 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-02-17 14:47:50,897 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.InternalError: lzo1x_decompress returned: -4
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native 
> Method)
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:307)
> at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
> at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:341)
> at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:371)
> at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:355)
> at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:387)
> at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
> a

[jira] [Commented] (PIG-2602) packageImportList should be configurable

2013-03-18 Thread Prashant Kommireddi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605833#comment-13605833
 ] 

Prashant Kommireddi commented on PIG-2602:
--

[~ashutoshc] should this be closed out if "udf.import.list" serves the purpose? 

> packageImportList should be configurable
> 
>
> Key: PIG-2602
> URL: https://issues.apache.org/jira/browse/PIG-2602
> Project: Pig
>  Issue Type: New Feature
>Reporter: Ashutosh Chauhan
>
> Currently, its hard-coded. These strings can be read from some config and 
> then can be used to resolve class names. That should succeed as long as those 
> classes are in classpath.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3110) pig corrupts chararrays with trailing whitespace when converting them to long

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605826#comment-13605826
 ] 

Daniel Dai commented on PIG-3110:
-

Agree, in case of parseLong fail, we can try to do a trim. Doing a double 
conversion introduce the rounding problem unnecessarily.

> pig corrupts chararrays with trailing whitespace when converting them to long
> -
>
> Key: PIG-3110
> URL: https://issues.apache.org/jira/browse/PIG-3110
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.10.0
>Reporter: Ido Hadanny
>
> when trying to convert the following string into long, pig corrupts it. data:
> 1703598819951657279 ,44081037
> data1 = load 'data' using CSVLoader as (a: chararray ,b: int);
> data2 = foreach data1 generate (long)a as a;
> dump data2;
> (1703598819951657216)<--- last 2 digits are corrupted
> data2 = foreach data1 generate (long)TRIM(a) as a;
> dump data2;
> (1703598819951657279)<--- correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-3217) Add support for DateTime type in Groovy UDFs

2013-03-18 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3217.
-

   Resolution: Fixed
Fix Version/s: 0.12
 Hadoop Flags: Reviewed

Committed to trunk. Thanks Mathias!

> Add support for DateTime type in Groovy UDFs
> 
>
> Key: PIG-3217
> URL: https://issues.apache.org/jira/browse/PIG-3217
> Project: Pig
>  Issue Type: Improvement
>  Components: internal-udfs
>Affects Versions: 0.11
>Reporter: Mathias Herberts
>Assignee: Mathias Herberts
> Fix For: 0.12
>
> Attachments: PIG-3217.patch
>
>
> The Groovy UDFs do not support the DateTime type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-3218) Add support for biginteger/bigdecimal type in Groovy UDFs

2013-03-18 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3218.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to trunk. Thanks Mathias!

> Add support for biginteger/bigdecimal type in Groovy UDFs
> -
>
> Key: PIG-3218
> URL: https://issues.apache.org/jira/browse/PIG-3218
> Project: Pig
>  Issue Type: Improvement
>  Components: internal-udfs
>Affects Versions: 0.11
>Reporter: Mathias Herberts
>Assignee: Mathias Herberts
> Fix For: 0.12
>
> Attachments: PIG-3218.patch
>
>
> Now that PIG-2764 has been integrated into trunk, we need to support 
> biginteger/bigdecimal in Groovy UDFs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3222) New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605788#comment-13605788
 ] 

Daniel Dai commented on PIG-3222:
-

[~pengfeng] I tried a similar query using HCatStorer, I didn't see the issue. 
Can you provide more details?

> New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer 
> ---
>
> Key: PIG-3222
> URL: https://issues.apache.org/jira/browse/PIG-3222
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11
>Reporter: Feng Peng
>  Labels: hcatalog
>
> Pig 0.11 assigns different UDFContextSignature for different invocations of 
> the same load/store statement. This change breaks the HCatStorer which 
> assumes all front-end and back-end invocations of the same store statement 
> has the same UDFContextSignature so that it can read the previously stored 
> information correctly.
> The related HCatalog code is in 
> https://svn.apache.org/repos/asf/incubator/hcatalog/branches/branch-0.5/hcatalog-pig-adapter/src/main/java/org/apache/hcatalog/pig/HCatStorer.java
>  (the setStoreLocation() function).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3110) pig corrupts chararrays with trailing whitespace when converting them to long

2013-03-18 Thread Prashant Kommireddi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605785#comment-13605785
 ] 

Prashant Kommireddi commented on PIG-3110:
--

I think the difference between Double and Integer/Long behavior here is that 
the former handles whitespaces. Here is a snippet from Double.java

{code}
public static FloatingDecimal
readJavaFormatString( String in ) throws NumberFormatException {
boolean isNegative = false;
boolean signSeen   = false;
int decExp;
charc;

parseNumber:
try{
in = in.trim(); // don't fool around with white space.
{code}

Another reason we use Double is for conversion from floating points.

Is there a reason other than these we use Double? I think case 1 (whitespace) 
could be handled in pig code rather than having to use Double and then 
converting it back to Integer/Long. PIG-2835 added 
sanityCheckIntegerLong(String s) for performance reasons. Trimming whitespace 
in addition to that should handle the issue described in this JIRA.

> pig corrupts chararrays with trailing whitespace when converting them to long
> -
>
> Key: PIG-3110
> URL: https://issues.apache.org/jira/browse/PIG-3110
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.10.0
>Reporter: Ido Hadanny
>
> when trying to convert the following string into long, pig corrupts it. data:
> 1703598819951657279 ,44081037
> data1 = load 'data' using CSVLoader as (a: chararray ,b: int);
> data2 = foreach data1 generate (long)a as a;
> dump data2;
> (1703598819951657216)<--- last 2 digits are corrupted
> data2 = foreach data1 generate (long)TRIM(a) as a;
> dump data2;
> (1703598819951657279)<--- correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-03-18 Thread Koji Noguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3251:
--

Attachment: pig-3251-trunk-v01.patch

In Bzip2TextInputFormat, it says

{code}
/**
 * Provide a bridge to get the bytes from the ByteArrayOutputStream 
without
 * creating a new byte array.
 */
private static class TextStuffer extends OutputStream {

{code}
However, in reality, Text just creates a new bytearray and copy the content.

Attaching a patch that is similar to the approach taken by 
org.apache.hadoop.util.LineReader but with less changes since HADOOP-4012(added 
in 0.21) was a huge patch.

This patch basically reads into the fixed-length-buffer and appends to Text 
whenever it gets full.

Touching BZip2LineRecordReader makes me nervous so I wanted the changes to be 
small.

I need to do more testings to see if this approach works or not.

> Bzip2TextInputFormat requires double the memory of maximum record size
> --
>
> Key: PIG-3251
> URL: https://issues.apache.org/jira/browse/PIG-3251
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-3251-trunk-v01.patch
>
>
> While looking at user's OOM heap dump, noticed that pig's 
> Bzip2TextInputFormat consumes memory at both
> Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
> and actual Text that is returned as line.
> For example, when having one record with 160MBytes, buffer was 268MBytes and 
> Text was 160MBytes.  
> We can probably eliminate one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2527) ILLUSTRATE fails for relations LOADed with the AvroStorage UDF

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605767#comment-13605767
 ] 

Daniel Dai commented on PIG-2527:
-

[~rohini], I didn't investigate which jira fix the issue. I did a trivial 
illustrate and it works. If you have a script which fail, can you open a new 
ticket and provide more details?


> ILLUSTRATE fails for relations LOADed with the AvroStorage UDF
> --
>
> Key: PIG-2527
> URL: https://issues.apache.org/jira/browse/PIG-2527
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.9.2, 0.10.0, 0.11, 0.10.1
>Reporter: Russell Jurney
>Assignee: Jonathan Coveney
>Priority: Blocker
>  Labels: avro, avro_udf, avrostorage, happy, pig, storage, udf
> Fix For: 0.10.1
>
>
> grunt> describe emails
> emails: {message_id: chararray,from: {PIG_WRAPPER: (ARRAY_ELEM: 
> chararray)},to: {PIG_WRAPPER: (ARRAY_ELEM: chararray)},cc: {PIG_WRAPPER: 
> (ARRAY_ELEM: chararray)},bcc: {PIG_WRAPPER: (ARRAY_ELEM: 
> chararray)},reply_to: {PIG_WRAPPER: (ARRAY_ELEM: chararray)},in_reply_to: 
> {PIG_WRAPPER: (ARRAY_ELEM: chararray)},subject: chararray,body: 
> chararray,date: chararray}
> grunt> illustrate emails 
> 2012-02-10 18:15:01,591 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///
> 2012-02-10 18:15:01,592 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - 
> File concatenation threshold: 100 optimistic? false
> 2012-02-10 18:15:01,649 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 1
> 2012-02-10 18:15:01,649 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 1
> 2012-02-10 18:15:01,649 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to 
> the job
> 2012-02-10 18:15:01,649 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-02-10 18:15:01,668 [main] INFO  
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
> process : 5
> 2012-02-10 18:15:02,719 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - 
> File concatenation threshold: 100 optimistic? false
> 2012-02-10 18:15:02,719 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 1
> 2012-02-10 18:15:02,719 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 1
> 2012-02-10 18:15:02,720 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to 
> the job
> 2012-02-10 18:15:02,720 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-02-10 18:15:02,733 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - 
> File concatenation threshold: 100 optimistic? false
> 2012-02-10 18:15:02,734 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size before optimization: 1
> 2012-02-10 18:15:02,734 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>  - MR plan size after optimization: 1
> 2012-02-10 18:15:02,734 [main] INFO  
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to 
> the job
> 2012-02-10 18:15:02,734 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>  - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-02-10 18:15:02,749 [main] ERROR 
> org.apache.pig.pen.AugmentBaseDataVisitor - No (valid) input data found!
> java.lang.RuntimeException: No (valid) input data found!
> at 
> org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:579)
> at 
> org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:218)
> at 
> org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:82)
> at 
> org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:66)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at 
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:

[jira] [Commented] (PIG-3208) [zebra] TFile should not set io.compression.codec.lzo.buffersize

2013-03-18 Thread Dmitriy V. Ryaboy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605765#comment-13605765
 ] 

Dmitriy V. Ryaboy commented on PIG-3208:


[~daijy] why wouldn't we commit fixes provided by community? 

> [zebra] TFile should not set io.compression.codec.lzo.buffersize
> 
>
> Key: PIG-3208
> URL: https://issues.apache.org/jira/browse/PIG-3208
> Project: Pig
>  Issue Type: Bug
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Attachments: PIG-3208.patch
>
>
> In contrib/zebra/src/java/org/apache/hadoop/zebra/tfile/Compression.java, the 
> following occurs:
> {code}
> conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
> {code}
> This can cause the LZO decompressor, if called within the context of reading 
> TFiles, to return with an error code when trying to uncompress LZO-compressed 
> data, if the data's compressed size is too large to fit in 64 * 1024 bytes.
> For example, the Hadoop-LZO code uses a different default value (256 * 1024):
> https://github.com/twitter/hadoop-lzo/blob/master/src/java/com/hadoop/compression/lzo/LzoCodec.java#L185
> This can lead to a case where, if data is compressed with a cluster where the 
> default {{io.compression.codec.lzo.buffersize}} = 256*1024 is used, then code 
> that tries to read this data by using Pig's zebra, the Mapper will exit with 
> code 134 because the LZO compressor returns a -4 (which encodes the LZO C 
> library error LZO_E_INPUT_OVERRUN) when trying to uncompress the data. The 
> stack trace of such a case is shown below:
> {code}
> 2013-02-17 14:47:50,709 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 3
> 2013-02-17 14:47:50,857 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,866 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 4
> 2013-02-17 14:47:50,887 INFO org.apache.hadoop.mapred.Merger: Merging 5 
> sorted segments
> 2013-02-17 14:47:50,890 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@66a23610
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 262144
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> read: 245688 bytes from decompressor.
> 2013-02-17 14:47:50,891 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@43684706
> 2013-02-17 14:47:50,892 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 65536
> 2013-02-17 14:47:50,895 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-02-17 14:47:50,897 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.InternalError: lzo1x_decompress returned: -4
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native 
> Method)
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:307)
> at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
> at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:341)
> at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:371)
> at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:355)
> at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:387)
> at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1548)
> at 
> org.ap

[jira] [Created] (PIG-3251) Bzip2TextInputFormat requires double the memory of maximum record size

2013-03-18 Thread Koji Noguchi (JIRA)

Koji Noguchi created PIG-3251:
-

 Summary: Bzip2TextInputFormat requires double the memory of 
maximum record size
 Key: PIG-3251
 URL: https://issues.apache.org/jira/browse/PIG-3251
 Project: Pig
  Issue Type: Improvement
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor


While looking at user's OOM heap dump, noticed that pig's Bzip2TextInputFormat 
consumes memory at both

Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
and actual Text that is returned as line.

For example, when having one record with 160MBytes, buffer was 268MBytes and 
Text was 160MBytes.  

We can probably eliminate one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3208) [zebra] TFile should not set io.compression.codec.lzo.buffersize

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605754#comment-13605754
 ] 

Daniel Dai commented on PIG-3208:
-

Hi, Eugene, Zebra is no longer supported except for compilation errors. You can 
continue to use it but we will not commit patches/fix bugs. 

> [zebra] TFile should not set io.compression.codec.lzo.buffersize
> 
>
> Key: PIG-3208
> URL: https://issues.apache.org/jira/browse/PIG-3208
> Project: Pig
>  Issue Type: Bug
>Reporter: Eugene Koontz
>Assignee: Eugene Koontz
> Attachments: PIG-3208.patch
>
>
> In contrib/zebra/src/java/org/apache/hadoop/zebra/tfile/Compression.java, the 
> following occurs:
> {code}
> conf.setInt("io.compression.codec.lzo.buffersize", 64 * 1024);
> {code}
> This can cause the LZO decompressor, if called within the context of reading 
> TFiles, to return with an error code when trying to uncompress LZO-compressed 
> data, if the data's compressed size is too large to fit in 64 * 1024 bytes.
> For example, the Hadoop-LZO code uses a different default value (256 * 1024):
> https://github.com/twitter/hadoop-lzo/blob/master/src/java/com/hadoop/compression/lzo/LzoCodec.java#L185
> This can lead to a case where, if data is compressed with a cluster where the 
> default {{io.compression.codec.lzo.buffersize}} = 256*1024 is used, then code 
> that tries to read this data by using Pig's zebra, the Mapper will exit with 
> code 134 because the LZO compressor returns a -4 (which encodes the LZO C 
> library error LZO_E_INPUT_OVERRUN) when trying to uncompress the data. The 
> stack trace of such a case is shown below:
> {code}
> 2013-02-17 14:47:50,709 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,849 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 3
> 2013-02-17 14:47:50,857 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,866 INFO com.hadoop.compression.lzo.LzoCodec: Creating 
> stream for compressor: com.hadoop.compression.lzo.LzoCompressor@6818c458 with 
> bufferSize: 262144
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.io.compress.CodecPool: Paying 
> back codec: com.hadoop.compression.lzo.LzoCompressor@6818c458
> 2013-02-17 14:47:50,879 INFO org.apache.hadoop.mapred.MapTask: Finished spill 
> 4
> 2013-02-17 14:47:50,887 INFO org.apache.hadoop.mapred.Merger: Merging 5 
> sorted segments
> 2013-02-17 14:47:50,890 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@66a23610
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 262144
> 2013-02-17 14:47:50,891 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> read: 245688 bytes from decompressor.
> 2013-02-17 14:47:50,891 INFO org.apache.hadoop.io.compress.CodecPool: 
> Borrowing codec: com.hadoop.compression.lzo.LzoDecompressor@43684706
> 2013-02-17 14:47:50,892 INFO com.hadoop.compression.lzo.LzoDecompressor: 
> calling decompressBytesDirect with buffer with: position: 0 and limit: 65536
> 2013-02-17 14:47:50,895 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-02-17 14:47:50,897 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.InternalError: lzo1x_decompress returned: -4
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native 
> Method)
> at 
> com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:307)
> at 
> org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:82)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75)
> at org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:341)
> at org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:371)
> at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:355)
> at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:387)
> at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:420)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> at org.apache.hadoop.mapred.Merger.merge(Merger.java:77)
> at 
> org.apache.hadoop.mapred.MapTask$Ma

[jira] [Commented] (PIG-3049) Cannot sort on a bag in nested foreach

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605749#comment-13605749
 ] 

Daniel Dai commented on PIG-3049:
-

This syntax should be supported. The error stack is on map side, explain shows:
{code}
MapReduce node scope-25
Map Plan
a1: Local Rearrange[tuple]{tuple}(false) - scope-10
|   |
|   Project[int][1] - scope-11
|
|---a: New For Each(false,false)[bag] - scope-7
|   |
|   Cast[chararray] - scope-2
|   |
|   |---Project[bytearray][0] - scope-1
|   |
|   Cast[int] - scope-5
|   |
|   |---Project[bytearray][1] - scope-4
|
|---a: 
Load(file:///Users/daijy/pig/words_and_numbers:org.apache.pig.builtin.PigStorage)
 - scope-0
Reduce Plan
b: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-24
|
|---b: New For Each(false,false)[bag] - scope-23
|   |
|   Project[int][0] - scope-12
|   |
|   POUserFunc(org.apache.pig.builtin.LongSum)[long] - scope-16
|   |
|   |---Project[bag][0] - scope-15
|   |
|   |---RelationToExpressionProject[bag][*] - scope-14
|   |
|   |---a_bag: New For Each(false)[bag] - scope-20
|   |   |
|   |   Project[chararray][0] - scope-18
|   |
|   |---Project[bag][1] - scope-17
|
|---a1: Package[tuple]{int} - scope-9
Global sort: false
Secondary sort: true
{code}
The key type for Local Rearrange is wrong. So this should be a bug.

Johnny, are you still working on it?

> Cannot sort on a bag in nested foreach
> --
>
> Key: PIG-3049
> URL: https://issues.apache.org/jira/browse/PIG-3049
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.12
>Reporter: Jonathan Coveney
>Assignee: Johnny Zhang
> Fix For: 0.12
>
>
> The following script fails.
> {code}
> a = load 'words_and_numbers' as (word:chararray, number:int);
> b = foreach (group a by number) {
>   a_bag = a.word;
>   ord = order a_bag by word;
>   generate group, ord;
> }
> dump b;
> {code}
> On this data:
> {code}
> $ cat words_and_numbers   
>
> hey   1
> hey   2
> you   3
> you   4
> I 5
> could 6
> {code}
> it throws the following error:
> {code}
> ava.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.pig.data.Tuple
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:469)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:160)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:384)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:333)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
> {code}
> Is this a supported feature of Pig? Seems reasonable, just seems like 
> something weird is going on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Jenkins build is back to normal : Pig-trunk #1438

2013-03-18 Thread Apache Jenkins Server

See

[jira] [Updated] (PIG-3172) Partition filter push down does not happen when there is a non partition key map column filter

2013-03-18 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3172:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk (0.12). Thanks Daniel.

> Partition filter push down does not happen when there is a non partition key 
> map column filter
> --
>
> Key: PIG-3172
> URL: https://issues.apache.org/jira/browse/PIG-3172
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3172-1.patch
>
>
> A = LOAD 'job_confs' USING org.apache.hcatalog.pig.HCatLoader();
> B = FILTER A by grid == 'cluster1' and dt < '2012_12_01' and dt > 
> '2012_11_20';
> C = FILTER B by params#'mapreduce.job.user.name' == 'userx';
> D = FOREACH B generate dt, grid, params#'mapreduce.job.user.name' as user,
> params#'mapreduce.job.name' as job_name, job_id,
> params#'mapreduce.job.cache.files';
> dump D;
> The query gives the below warning and ends up scanning the whole table 
> instead of pushing the partition key filters grid and dt.
> [main] WARN  org.apache.pig.newplan.PColFilterExtractor - No partition filter
> push down: Internal error while processing any partition filter conditions in
> the filter after the load
> Works fine if the second filter is on a column with simple datatype like 
> chararray instead of map.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3238) Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point.

2013-03-18 Thread Prashant Kommireddi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605629#comment-13605629
 ] 

Prashant Kommireddi commented on PIG-3238:
--

Also, if you could add the Apache license header

{code}
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
{code}

> Pig current releases lack a UDF Stuff(). This UDF deletes a specified length 
> of characters and inserts another set of characters at a specified starting 
> point.
> ---
>
> Key: PIG-3238
> URL: https://issues.apache.org/jira/browse/PIG-3238
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Sonu Prathap
> Fix For: 0.10.0
>
> Attachments: Stuff.java.patch
>
>
> Pig current releases lack a UDF Stuff(). This UDF deletes a specified length 
> of characters and inserts another set of characters at a specified starting 
> point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3248.
-

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to trunk.

> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3163) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-03-18 Thread Prashant Kommireddi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605620#comment-13605620
 ] 

Prashant Kommireddi commented on PIG-3163:
--

[~anuroopa george] would you like to take a stab at this?

> Pig current releases lack a UDF endsWith.This UDF tests if a given string 
> ends with the specified suffix.
> -
>
> Key: PIG-3163
> URL: https://issues.apache.org/jira/browse/PIG-3163
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Anuroopa George
> Fix For: 0.9.3
>
>
> Pig current releases lack a UDF endsWith.This UDF tests if a given string  
> ends with the specified suffix.This UDF returns true if the character 
> sequence represented by the string argument given as a suffix is a suffix of 
> the character sequence represented by the given string; false otherwise.Also 
> true will be returned if the given suffix is an empty string or is equal to 
> the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3163) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-03-18 Thread Prashant Kommireddi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-3163:
-

Issue Type: New Feature  (was: Bug)

> Pig current releases lack a UDF endsWith.This UDF tests if a given string 
> ends with the specified suffix.
> -
>
> Key: PIG-3163
> URL: https://issues.apache.org/jira/browse/PIG-3163
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Anuroopa George
> Fix For: 0.9.3
>
>
> Pig current releases lack a UDF endsWith.This UDF tests if a given string  
> ends with the specified suffix.This UDF returns true if the character 
> sequence represented by the string argument given as a suffix is a suffix of 
> the character sequence represented by the given string; false otherwise.Also 
> true will be returned if the given suffix is an empty string or is equal to 
> the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605615#comment-13605615
 ] 

Thejas M Nair commented on PIG-3248:


+1

> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Are we ready for 0.11.1 release?

2013-03-18 Thread Bill Graham

Sure, I can get a RC out this week.


On Mon, Mar 18, 2013 at 10:51 AM, Dmitriy Ryaboy  wrote:

> Yeah adding new types seems like a big thing, would prefer for it to be
> 0.12 only.
>
>
> Sounds like we are ready to roll 0.11.1..  Bill, want to do the honors
> again?
>
>
> On Mon, Mar 18, 2013 at 10:40 AM, Julien Le Dem  wrote:
>
> > Agreed with Daniel,
> > PIG-2764 will go in Pig 0.12
> > Julien
> >
> > On Mar 18, 2013, at 10:32 AM, Daniel Dai wrote:
> >
> > > Dimitry: Just committed PIG-3132.
> > >
> > > Richard: PIG-2764 is new feature, we usually don't include new feature
> > > in minor release.
> > >
> > > Daniel
> > >
> > > On Mon, Mar 18, 2013 at 10:21 AM, Richard Ding 
> > wrote:
> > >> How about PIG-2764? It would be nice to include this feature.
> > >>
> > >>
> > >> On Mon, Mar 18, 2013 at 1:04 AM, Dmitriy Ryaboy 
> > wrote:
> > >>
> > >>> Just +1'd it.
> > >>> I think after this one we are good to go?
> > >>>
> > >>>
> > >>> On Sun, Mar 17, 2013 at 9:09 PM, Daniel Dai 
> > wrote:
> > >>>
> >  Can I include PIG-3132?
> > 
> >  Thanks,
> >  Daniel
> > 
> >  On Fri, Mar 15, 2013 at 5:57 PM, Julien Le Dem 
> > wrote:
> > > +1 for a new release
> > >
> > > Julien
> > >
> > > On Mar 15, 2013, at 17:08, Dmitriy Ryaboy 
> > wrote:
> > >
> > >> I think all the critical patches we discussed as required for
> 0.11.1
> >  have
> > >> gone in -- is there anything else people want to finish up, or can
> > we
> >  roll
> > >> this?  Current change log:
> > >>
> > >> Release 0.11.1 (unreleased)
> > >>
> > >> INCOMPATIBLE CHANGES
> > >>
> > >> IMPROVEMENTS
> > >>
> > >> PIG-2988: start deploying pigunit maven artifact part of Pig
> release
> > >> process (njw45 via rohini)
> > >>
> > >> PIG-3148: OutOfMemory exception while spilling stale
> DefaultDataBag.
> >  Extra
> > >> option to gc() before spilling large bag. (knoguchi via rohini)
> > >>
> > >> PIG-3216: Groovy UDFs documentation has minor typos (herberts via
> >  rohini)
> > >>
> > >> PIG-3202: CUBE operator not documented in user docs (prasanth_j
> via
> > >> billgraham)
> > >>
> > >> OPTIMIZATIONS
> > >>
> > >> BUG FIXES
> > >>
> > >> PIG-3194: Changes to ObjectSerializer.java break compatibility
> with
> >  Hadoop
> > >> 0.20.2 (prkommireddi via dvryaboy)
> > >>
> > >> PIG-3241: ConcurrentModificationException in POPartialAgg
> (dvryaboy)
> > >>
> > >> PIG-3144: Erroneous map entry alias resolution leading to
> "Duplicate
> >  schema
> > >> alias" errors (jcoveney via cheolsoo)
> > >>
> > >> PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
> > >> Proactive Spill (kadeng via dvryaboy)
> > >>
> > >> PIG-3206: HBaseStorage does not work with Oozie pig action and
> > secure
> >  HBase
> > >> (rohini)
> > 
> > >>>
> >
> >
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgra...@gmail.com going forward.*

[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605610#comment-13605610
 ] 

Daniel Dai commented on PIG-3248:
-

Yes, otherwise I need to add "-Xss" option to increase the stack size.

> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-03-18 Thread Prashant Kommireddi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605589#comment-13605589
 ] 

Prashant Kommireddi commented on PIG-3015:
--

[~jadler] note the "tagsource" option on PigStorage is deprecated in 0.12 and 
replaced with "tagFile". Additionally, there is an option "tagPath" for getting 
the entire path and not just the filename. See PIG-2857

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605584#comment-13605584
 ] 

Thejas M Nair commented on PIG-3248:


bq. This is actually a different issue. TestPigSplit always fail on my test 
machine due to stack overflow. It is Ok if you want me open a separate Jira for 
this issue.
As it is a very minor change to a test case, i think its fine to include it 
here. 200 is large enough for number of statements, so i think so many levels 
instead of 500 should be ok, if it is causing issues with some jvm configs.



> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605580#comment-13605580
 ] 

Daniel Dai commented on PIG-3248:
-

This is actually a different issue. TestPigSplit always fail on my test machine 
due to stack overflow. It is Ok if you want me open a separate Jira for this 
issue.

> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-03-18 Thread Joseph Adler (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605579#comment-13605579
 ] 

Joseph Adler commented on PIG-3015:
---

I like the -tagsource option idea. Should we allow the user to provide a name 
for the "tag source" field? (If we picked a name like "tagSource," and there 
was already a field in the avro Schema called "tagSource", I'm concerned that 
we'd have to deal with that conflict. I think it would be cleaner to let the 
end user resolve the naming issue.)

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3248) Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha

2013-03-18 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605460#comment-13605460
 ] 

Thejas M Nair commented on PIG-3248:


Daniel, Is the following change relevant for the hadoop 2 version upgrade ?

{code}
===
--- test/org/apache/pig/test/TestPigSplit.java  (revision 1456106)
+++ test/org/apache/pig/test/TestPigSplit.java  (working copy)
@@ -108,7 +108,7 @@
 createInput(new String[] { "0\ta" });
 
 pigServer.registerQuery("a = load '" + inputFileName + "';");
-for (int i = 0; i < 500; i++) {
+for (int i = 0; i < 200; i++) {
 pigServer.registerQuery("a = filter a by $0 == '1';");
 }
 Iterator iter = pigServer.openIterator("a");
{code}

> Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha
> 
>
> Key: PIG-3248
> URL: https://issues.apache.org/jira/browse/PIG-3248
> Project: Pig
>  Issue Type: Improvement
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3248-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Are we ready for 0.11.1 release?

2013-03-18 Thread Dmitriy Ryaboy

Yeah adding new types seems like a big thing, would prefer for it to be
0.12 only.


Sounds like we are ready to roll 0.11.1..  Bill, want to do the honors
again?


On Mon, Mar 18, 2013 at 10:40 AM, Julien Le Dem  wrote:

> Agreed with Daniel,
> PIG-2764 will go in Pig 0.12
> Julien
>
> On Mar 18, 2013, at 10:32 AM, Daniel Dai wrote:
>
> > Dimitry: Just committed PIG-3132.
> >
> > Richard: PIG-2764 is new feature, we usually don't include new feature
> > in minor release.
> >
> > Daniel
> >
> > On Mon, Mar 18, 2013 at 10:21 AM, Richard Ding 
> wrote:
> >> How about PIG-2764? It would be nice to include this feature.
> >>
> >>
> >> On Mon, Mar 18, 2013 at 1:04 AM, Dmitriy Ryaboy 
> wrote:
> >>
> >>> Just +1'd it.
> >>> I think after this one we are good to go?
> >>>
> >>>
> >>> On Sun, Mar 17, 2013 at 9:09 PM, Daniel Dai 
> wrote:
> >>>
>  Can I include PIG-3132?
> 
>  Thanks,
>  Daniel
> 
>  On Fri, Mar 15, 2013 at 5:57 PM, Julien Le Dem 
> wrote:
> > +1 for a new release
> >
> > Julien
> >
> > On Mar 15, 2013, at 17:08, Dmitriy Ryaboy 
> wrote:
> >
> >> I think all the critical patches we discussed as required for 0.11.1
>  have
> >> gone in -- is there anything else people want to finish up, or can
> we
>  roll
> >> this?  Current change log:
> >>
> >> Release 0.11.1 (unreleased)
> >>
> >> INCOMPATIBLE CHANGES
> >>
> >> IMPROVEMENTS
> >>
> >> PIG-2988: start deploying pigunit maven artifact part of Pig release
> >> process (njw45 via rohini)
> >>
> >> PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag.
>  Extra
> >> option to gc() before spilling large bag. (knoguchi via rohini)
> >>
> >> PIG-3216: Groovy UDFs documentation has minor typos (herberts via
>  rohini)
> >>
> >> PIG-3202: CUBE operator not documented in user docs (prasanth_j via
> >> billgraham)
> >>
> >> OPTIMIZATIONS
> >>
> >> BUG FIXES
> >>
> >> PIG-3194: Changes to ObjectSerializer.java break compatibility with
>  Hadoop
> >> 0.20.2 (prkommireddi via dvryaboy)
> >>
> >> PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
> >>
> >> PIG-3144: Erroneous map entry alias resolution leading to "Duplicate
>  schema
> >> alias" errors (jcoveney via cheolsoo)
> >>
> >> PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
> >> Proactive Spill (kadeng via dvryaboy)
> >>
> >> PIG-3206: HBaseStorage does not work with Oozie pig action and
> secure
>  HBase
> >> (rohini)
> 
> >>>
>
>

Re: Are we ready for 0.11.1 release?

2013-03-18 Thread Julien Le Dem

Agreed with Daniel,
PIG-2764 will go in Pig 0.12
Julien

On Mar 18, 2013, at 10:32 AM, Daniel Dai wrote:

> Dimitry: Just committed PIG-3132.
> 
> Richard: PIG-2764 is new feature, we usually don't include new feature
> in minor release.
> 
> Daniel
> 
> On Mon, Mar 18, 2013 at 10:21 AM, Richard Ding  wrote:
>> How about PIG-2764? It would be nice to include this feature.
>> 
>> 
>> On Mon, Mar 18, 2013 at 1:04 AM, Dmitriy Ryaboy  wrote:
>> 
>>> Just +1'd it.
>>> I think after this one we are good to go?
>>> 
>>> 
>>> On Sun, Mar 17, 2013 at 9:09 PM, Daniel Dai  wrote:
>>> 
 Can I include PIG-3132?
 
 Thanks,
 Daniel
 
 On Fri, Mar 15, 2013 at 5:57 PM, Julien Le Dem  wrote:
> +1 for a new release
> 
> Julien
> 
> On Mar 15, 2013, at 17:08, Dmitriy Ryaboy  wrote:
> 
>> I think all the critical patches we discussed as required for 0.11.1
 have
>> gone in -- is there anything else people want to finish up, or can we
 roll
>> this?  Current change log:
>> 
>> Release 0.11.1 (unreleased)
>> 
>> INCOMPATIBLE CHANGES
>> 
>> IMPROVEMENTS
>> 
>> PIG-2988: start deploying pigunit maven artifact part of Pig release
>> process (njw45 via rohini)
>> 
>> PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag.
 Extra
>> option to gc() before spilling large bag. (knoguchi via rohini)
>> 
>> PIG-3216: Groovy UDFs documentation has minor typos (herberts via
 rohini)
>> 
>> PIG-3202: CUBE operator not documented in user docs (prasanth_j via
>> billgraham)
>> 
>> OPTIMIZATIONS
>> 
>> BUG FIXES
>> 
>> PIG-3194: Changes to ObjectSerializer.java break compatibility with
 Hadoop
>> 0.20.2 (prkommireddi via dvryaboy)
>> 
>> PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
>> 
>> PIG-3144: Erroneous map entry alias resolution leading to "Duplicate
 schema
>> alias" errors (jcoveney via cheolsoo)
>> 
>> PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
>> Proactive Spill (kadeng via dvryaboy)
>> 
>> PIG-3206: HBaseStorage does not work with Oozie pig action and secure
 HBase
>> (rohini)
 
>>>

Re: Are we ready for 0.11.1 release?

2013-03-18 Thread Daniel Dai

Dimitry: Just committed PIG-3132.

Richard: PIG-2764 is new feature, we usually don't include new feature
in minor release.

Daniel

On Mon, Mar 18, 2013 at 10:21 AM, Richard Ding  wrote:
> How about PIG-2764? It would be nice to include this feature.
>
>
> On Mon, Mar 18, 2013 at 1:04 AM, Dmitriy Ryaboy  wrote:
>
>> Just +1'd it.
>> I think after this one we are good to go?
>>
>>
>> On Sun, Mar 17, 2013 at 9:09 PM, Daniel Dai  wrote:
>>
>> > Can I include PIG-3132?
>> >
>> > Thanks,
>> > Daniel
>> >
>> > On Fri, Mar 15, 2013 at 5:57 PM, Julien Le Dem  wrote:
>> > > +1 for a new release
>> > >
>> > > Julien
>> > >
>> > > On Mar 15, 2013, at 17:08, Dmitriy Ryaboy  wrote:
>> > >
>> > >> I think all the critical patches we discussed as required for 0.11.1
>> > have
>> > >> gone in -- is there anything else people want to finish up, or can we
>> > roll
>> > >> this?  Current change log:
>> > >>
>> > >> Release 0.11.1 (unreleased)
>> > >>
>> > >> INCOMPATIBLE CHANGES
>> > >>
>> > >> IMPROVEMENTS
>> > >>
>> > >> PIG-2988: start deploying pigunit maven artifact part of Pig release
>> > >> process (njw45 via rohini)
>> > >>
>> > >> PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag.
>> > Extra
>> > >> option to gc() before spilling large bag. (knoguchi via rohini)
>> > >>
>> > >> PIG-3216: Groovy UDFs documentation has minor typos (herberts via
>> > rohini)
>> > >>
>> > >> PIG-3202: CUBE operator not documented in user docs (prasanth_j via
>> > >> billgraham)
>> > >>
>> > >> OPTIMIZATIONS
>> > >>
>> > >> BUG FIXES
>> > >>
>> > >> PIG-3194: Changes to ObjectSerializer.java break compatibility with
>> > Hadoop
>> > >> 0.20.2 (prkommireddi via dvryaboy)
>> > >>
>> > >> PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
>> > >>
>> > >> PIG-3144: Erroneous map entry alias resolution leading to "Duplicate
>> > schema
>> > >> alias" errors (jcoveney via cheolsoo)
>> > >>
>> > >> PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
>> > >> Proactive Spill (kadeng via dvryaboy)
>> > >>
>> > >> PIG-3206: HBaseStorage does not work with Oozie pig action and secure
>> > HBase
>> > >> (rohini)
>> >
>>

[jira] [Resolved] (PIG-3132) NPE when illustrating a relation with HCatLoader

2013-03-18 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3132.
-

   Resolution: Fixed
Fix Version/s: (was: 0.12)
   0.11.1
 Hadoop Flags: Reviewed

Patch committed to 0.11 branch and trunk.

>  NPE when illustrating a relation with HCatLoader
> -
>
> Key: PIG-3132
> URL: https://issues.apache.org/jira/browse/PIG-3132
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.11.1
>
> Attachments: PIG-3132-1.patch
>
>
> Get NPE exception when illustrate a relation with HCatLoader:
> {code}
> A = LOAD 'studenttab10k' USING org.apache.hcatalog.pig.HCatLoader();
> illustrate A;
> {code}
> Exception:
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:274)
> at 
> org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:238)
> at 
> org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:61)
> at 
> org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:210)
> at 
> org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:190)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:129)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at 
> org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:194)
> at 
> org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
> at 
> org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:222)
> at 
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:154)
> at org.apache.pig.PigServer.getExamples(PigServer.java:1245)
> at 
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
> {code}
> HCatalog side is tracked with HCATALOG-163.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Are we ready for 0.11.1 release?

2013-03-18 Thread Richard Ding

How about PIG-2764? It would be nice to include this feature.


On Mon, Mar 18, 2013 at 1:04 AM, Dmitriy Ryaboy  wrote:

> Just +1'd it.
> I think after this one we are good to go?
>
>
> On Sun, Mar 17, 2013 at 9:09 PM, Daniel Dai  wrote:
>
> > Can I include PIG-3132?
> >
> > Thanks,
> > Daniel
> >
> > On Fri, Mar 15, 2013 at 5:57 PM, Julien Le Dem  wrote:
> > > +1 for a new release
> > >
> > > Julien
> > >
> > > On Mar 15, 2013, at 17:08, Dmitriy Ryaboy  wrote:
> > >
> > >> I think all the critical patches we discussed as required for 0.11.1
> > have
> > >> gone in -- is there anything else people want to finish up, or can we
> > roll
> > >> this?  Current change log:
> > >>
> > >> Release 0.11.1 (unreleased)
> > >>
> > >> INCOMPATIBLE CHANGES
> > >>
> > >> IMPROVEMENTS
> > >>
> > >> PIG-2988: start deploying pigunit maven artifact part of Pig release
> > >> process (njw45 via rohini)
> > >>
> > >> PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag.
> > Extra
> > >> option to gc() before spilling large bag. (knoguchi via rohini)
> > >>
> > >> PIG-3216: Groovy UDFs documentation has minor typos (herberts via
> > rohini)
> > >>
> > >> PIG-3202: CUBE operator not documented in user docs (prasanth_j via
> > >> billgraham)
> > >>
> > >> OPTIMIZATIONS
> > >>
> > >> BUG FIXES
> > >>
> > >> PIG-3194: Changes to ObjectSerializer.java break compatibility with
> > Hadoop
> > >> 0.20.2 (prkommireddi via dvryaboy)
> > >>
> > >> PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
> > >>
> > >> PIG-3144: Erroneous map entry alias resolution leading to "Duplicate
> > schema
> > >> alias" errors (jcoveney via cheolsoo)
> > >>
> > >> PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
> > >> Proactive Spill (kadeng via dvryaboy)
> > >>
> > >> PIG-3206: HBaseStorage does not work with Oozie pig action and secure
> > HBase
> > >> (rohini)
> >
>

[jira] [Updated] (PIG-3205) Passing arguments to python script does not work with -f option

2013-03-18 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3205:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Cheolsoo. 

Note: Check in has misleading commit message "Fix compilation error due to 
PIG-2507" as I intended to commit the compilation failure fix for TestGrunt 
first, but ended up committing files in this patch along with it.

> Passing arguments to python script does not work with -f option
> ---
>
> Key: PIG-3205
> URL: https://issues.apache.org/jira/browse/PIG-3205
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12
>
> Attachments: PIG-3205.patch
>
>
> With "pig sample.py arg1 arg2", arguments can be accessed in the embedded 
> python script using sys.argv[]. But not in the case "pig -f sample.py arg1 
> arg2". 
> In case of ExecMode.FILE, we don't set PigContext.PIG_CMD_ARGS_REMAINDERS and 
> so the arguments are not passed to JythonScriptEngine or GroovyScriptEngine. 
> This is specially a problem with Oozie as it always uses -f option to specify 
> the pig script.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Build failed in Jenkins: Pig-trunk #1437

2013-03-18 Thread Apache Jenkins Server

See 

Changes:

[daijy] PIG-2507: Semicolon in paramenters for UDF results in parsing error 
(tnachen via daijy)

--
[...truncated 6813 lines...]
[ivy:resolve]   found org.apache.httpcomponents#httpclient;4.1 in maven2
[ivy:resolve]   found org.apache.httpcomponents#httpcore;4.1 in maven2
[ivy:resolve]   found log4j#log4j;1.2.16 in fs
[ivy:resolve]   found org.slf4j#slf4j-log4j12;1.6.1 in fs
[ivy:resolve]   found org.apache.avro#avro;1.5.3 in fs
[ivy:resolve]   found com.thoughtworks.paranamer#paranamer;2.3 in fs
[ivy:resolve]   found org.xerial.snappy#snappy-java;1.0.3.2 in fs
[ivy:resolve]   found org.slf4j#slf4j-api;1.6.1 in fs
[ivy:resolve]   found com.googlecode.json-simple#json-simple;1.1 in fs
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in fs
[ivy:resolve]   found jline#jline;0.9.94 in fs
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found org.codehaus.groovy#groovy-all;1.8.6 in maven2
[ivy:resolve]   found org.codehaus.jackson#jackson-mapper-asl;1.8.8 in fs
[ivy:resolve]   found org.codehaus.jackson#jackson-core-asl;1.8.8 in fs
[ivy:resolve]   found org.fusesource.jansi#jansi;1.9 in maven2
[ivy:resolve]   found joda-time#joda-time;2.1 in maven2
[ivy:resolve]   found com.google.guava#guava;11.0 in maven2
[ivy:resolve]   found org.python#jython-standalone;2.5.2 in maven2
[ivy:resolve]   found rhino#js;1.7R2 in maven2
[ivy:resolve]   found org.antlr#antlr;3.4 in maven2
[ivy:resolve]   found org.antlr#antlr-runtime;3.4 in maven2
[ivy:resolve]   found org.antlr#stringtemplate;3.2.1 in maven2
[ivy:resolve]   found antlr#antlr;2.7.7 in fs
[ivy:resolve]   found org.antlr#ST4;4.0.4 in maven2
[ivy:resolve]   found org.apache.zookeeper#zookeeper;3.4.4 in maven2
[ivy:resolve]   found dk.brics.automaton#automaton;1.11-8 in maven2
[ivy:resolve]   found org.jruby#jruby-complete;1.6.7 in maven2
[ivy:resolve]   found org.apache.hbase#hbase;0.94.1 in maven2
[ivy:resolve]   found org.vafer#jdeb;0.8 in maven2
[ivy:resolve]   found org.mockito#mockito-all;1.8.4 in maven2
[ivy:resolve]   found xalan#xalan;2.7.1 in maven2
[ivy:resolve]   found xalan#serializer;2.7.1 in maven2
[ivy:resolve]   found xml-apis#xml-apis;1.3.04 in fs
[ivy:resolve]   found xerces#xercesImpl;2.10.0 in maven2
[ivy:resolve]   found xml-apis#xml-apis;1.4.01 in maven2
[ivy:resolve]   found junit#junit;4.11 in fs
[ivy:resolve]   found org.hamcrest#hamcrest-core;1.3 in fs
[ivy:resolve]   found org.jboss.netty#netty;3.2.2.Final in fs
[ivy:resolve]   found com.github.stephenc.high-scale-lib#high-scale-lib;1.1.1 
in fs
[ivy:resolve]   found com.google.protobuf#protobuf-java;2.4.0a in fs
[ivy:resolve]   found com.yammer.metrics#metrics-core;2.1.2 in fs
[ivy:resolve]   found org.slf4j#slf4j-api;1.6.4 in fs
[ivy:resolve]   found org.apache.hive#hive-exec;0.8.0 in maven2
[ivy:resolve]   found junit#junit;3.8.1 in fs
[ivy:resolve]   found com.google.code.p.arat#rat-lib;0.5.1 in maven2
[ivy:resolve]   found commons-collections#commons-collections;3.2 in fs
[ivy:resolve]   found commons-lang#commons-lang;2.1 in fs
[ivy:resolve]   found jdiff#jdiff;1.0.9 in fs
[ivy:resolve]   found checkstyle#checkstyle;4.2 in maven2
[ivy:resolve]   found commons-beanutils#commons-beanutils-core;1.7.0 in fs
[ivy:resolve]   found commons-cli#commons-cli;1.0 in fs
[ivy:resolve]   found commons-logging#commons-logging;1.0.3 in fs
[ivy:resolve]   found org.codehaus.jackson#jackson-mapper-asl;1.0.1 in fs
[ivy:resolve]   found org.codehaus.jackson#jackson-core-asl;1.0.1 in fs
[ivy:resolve]   found com.sun.jersey#jersey-bundle;1.8 in maven2
[ivy:resolve]   found com.sun.jersey#jersey-server;1.8 in fs
[ivy:resolve]   found com.sun.jersey.contribs#jersey-guice;1.8 in fs
[ivy:resolve]   found commons-httpclient#commons-httpclient;3.1 in fs
[ivy:resolve]   found javax.servlet#servlet-api;2.5 in fs
[ivy:resolve]   found javax.ws.rs#jsr311-api;1.1.1 in maven2
[ivy:resolve]   found javax.inject#javax.inject;1 in fs
[ivy:resolve]   found javax.xml.bind#jaxb-api;2.2.2 in fs
[ivy:resolve]   found com.sun.xml.bind#jaxb-impl;2.2.3-1 in fs
[ivy:resolve]   found com.google.inject#guice;3.0 in fs
[ivy:resolve]   found com.google.inject.extensions#guice-servlet;3.0 in fs
[ivy:resolve]   found aopalliance#aopalliance;1.0 in fs
[ivy:resolve]   found org.apache.hadoop#hadoop-annotations;2.0.0-alpha in maven2
[ivy:resolve]   found org.apache.hadoop#hadoop-auth;2.0.0-alpha in maven2
[ivy:resolve]   found org.apache.hadoop#hadoop-common;2.0.0-alpha in maven2
[ivy:resolve]   found org.apache.hadoop#hadoop-hdfs;2.0.0-alpha in maven2
[ivy:resolve]   found 
org.apache.hadoop#hadoop-mapreduce-client-core;2.0.0-alpha in maven2
[ivy:resolve]   found 
org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.0.0-alpha in maven2
[ivy:resolve]   found org.apache.hadoop#hadoop-yarn-server-tests;2.0.0-alpha in 
maven2
[ivy:resolve]   found org.apache.hadoop#hadoop-mapreduce-client-app;2.0.0-alpha

[jira] [Commented] (PIG-3132) NPE when illustrating a relation with HCatLoader

2013-03-18 Thread Dmitriy V. Ryaboy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604929#comment-13604929
 ] 

Dmitriy V. Ryaboy commented on PIG-3132:


+1

>  NPE when illustrating a relation with HCatLoader
> -
>
> Key: PIG-3132
> URL: https://issues.apache.org/jira/browse/PIG-3132
> Project: Pig
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.12
>
> Attachments: PIG-3132-1.patch
>
>
> Get NPE exception when illustrate a relation with HCatLoader:
> {code}
> A = LOAD 'studenttab10k' USING org.apache.hcatalog.pig.HCatLoader();
> illustrate A;
> {code}
> Exception:
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:274)
> at 
> org.apache.hcatalog.pig.PigHCatUtil.transformToTuple(PigHCatUtil.java:238)
> at 
> org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:61)
> at 
> org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:210)
> at 
> org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:190)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad.getNext(POLoad.java:129)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:267)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at 
> org.apache.pig.pen.LocalMapReduceSimulator.launchPig(LocalMapReduceSimulator.java:194)
> at 
> org.apache.pig.pen.ExampleGenerator.getData(ExampleGenerator.java:257)
> at 
> org.apache.pig.pen.ExampleGenerator.readBaseData(ExampleGenerator.java:222)
> at 
> org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:154)
> at org.apache.pig.PigServer.getExamples(PigServer.java:1245)
> at 
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:698)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:591)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:306)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
> {code}
> HCatalog side is tracked with HCATALOG-163.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Are we ready for 0.11.1 release?

2013-03-18 Thread Dmitriy Ryaboy

Just +1'd it.
I think after this one we are good to go?


On Sun, Mar 17, 2013 at 9:09 PM, Daniel Dai  wrote:

> Can I include PIG-3132?
>
> Thanks,
> Daniel
>
> On Fri, Mar 15, 2013 at 5:57 PM, Julien Le Dem  wrote:
> > +1 for a new release
> >
> > Julien
> >
> > On Mar 15, 2013, at 17:08, Dmitriy Ryaboy  wrote:
> >
> >> I think all the critical patches we discussed as required for 0.11.1
> have
> >> gone in -- is there anything else people want to finish up, or can we
> roll
> >> this?  Current change log:
> >>
> >> Release 0.11.1 (unreleased)
> >>
> >> INCOMPATIBLE CHANGES
> >>
> >> IMPROVEMENTS
> >>
> >> PIG-2988: start deploying pigunit maven artifact part of Pig release
> >> process (njw45 via rohini)
> >>
> >> PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag.
> Extra
> >> option to gc() before spilling large bag. (knoguchi via rohini)
> >>
> >> PIG-3216: Groovy UDFs documentation has minor typos (herberts via
> rohini)
> >>
> >> PIG-3202: CUBE operator not documented in user docs (prasanth_j via
> >> billgraham)
> >>
> >> OPTIMIZATIONS
> >>
> >> BUG FIXES
> >>
> >> PIG-3194: Changes to ObjectSerializer.java break compatibility with
> Hadoop
> >> 0.20.2 (prkommireddi via dvryaboy)
> >>
> >> PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
> >>
> >> PIG-3144: Erroneous map entry alias resolution leading to "Duplicate
> schema
> >> alias" errors (jcoveney via cheolsoo)
> >>
> >> PIG-3212: Race Conditions in POSort and (Internal)SortedBag during
> >> Proactive Spill (kadeng via dvryaboy)
> >>
> >> PIG-3206: HBaseStorage does not work with Oozie pig action and secure
> HBase
> >> (rohini)
>

[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-03-18 Thread Frederic Rechtenstein (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604910#comment-13604910
 ] 

Frederic Rechtenstein commented on PIG-3015:


Hi,

This looks very nice, I am really looking forward for it to be released.

Does it make sense to have an option to include the input file path with each 
tuple (like -tagsource in PigStorage) ?

I understand that this would add one more item to an already long list of 
options. But there are some real use cases needing this feature and it would 
make AvroStorage more similar to PigStorage.

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

54 matches

Mail list logo