[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349513#comment-16349513
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

Ok, here's repro code. I get the bar exception if dd2 is added. Note that dd2 
is not used for anything and is not related in any way to dd1.
Note that if I end dd1 and then reuse it, I get a NPE in Java code. But if I 
end dd2, internals of Java object are not affected in dd1; looks like the 
native side has some issue.
{noformat}

dest.position(startPos);
dest.limit(startLim);
src.position(startSrcPos);
src.limit(startSrcLim);

ZlibDecompressor.ZlibDirectDecompressor dd1 = new 
ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0);
dd1.decompress(src, dest);
dest.limit(dest.position()); // Set the new limit to where the decompressor 
stopped.
dest.position(startPos);
if (dest.remaining() == 0) {
  throw new RuntimeException("foo");
}

ZlibDecompressor.ZlibDirectDecompressor dd2 = new 
ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0);
dest.position(startPos);
dest.limit(startLim);
src.position(startSrcPos);
src.limit(startSrcLim);
dd2.end();
dd1.decompress(src, dest);
dest.limit(dest.position()); // Set the new limit to where the decompressor 
stopped.
dest.position(startPos);
if (dest.remaining() == 0) {
  throw new RuntimeException("bar");
}

{noformat}

As a side note, Z_BUF_ERROR error in the native code is not processed 
correctly. See the detailed example for this error here 
http://zlib.net/zlib_how.html ; given that neither the Java nor native code 
handles partial reads; and nothing propagates the state to the caller, this 
should throw an error just like Z_DATA_ERROR.
The buffer address null checks should probably also throw and not exit silently.
Z_NEED_DICT handling is also suspicious. Does anything actually handle this?


> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349307#comment-16349307
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

Hmm, nm, it might be a red herring

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349238#comment-16349238
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

Tentative cause (still confirming) - calling end() on ZlibDirectDecompressor 
breaks other unrelated ZlibDirectDecompressor-s. So it may not be related to 
buffers as such.


> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344268#comment-16344268
 ] 

Steve Loughran commented on HADOOP-15171:
-

There was another JIRA on this wasn't there? Sergei, can you find it?

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-29 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344204#comment-16344204
 ] 

Gopal V commented on HADOOP-15171:
--

bq.  this is becoming a pain

This is a huge perf hit right now, the workaround is much slower than the 
original codepath.

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344184#comment-16344184
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

[~ste...@apache.org] [~jnp] is it possible to get some traction on this 
actually? We now also have to work around this in ORC project, and this is 
becoming a pain

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Critical
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328710#comment-16328710
 ] 

Steve Loughran commented on HADOOP-15171:
-

Irrespective of the fix, sounds like the hadoop decompressor code should do a 
followup check that the #of bytes returned is non-zero

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324924#comment-16324924
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

[~jnp] [~hagleitn] fyi

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org