[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918438#comment-16918438 ]
Hadoop QA commented on HDFS-14099: ---------------------------------- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 13s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14099 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978855/HDFS-14099-trunk-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1f46bca5fa32 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 371c9eb | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27715/testReport/ | | Max. process+thread count | 1344 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27715/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > ------------------------------------------------------------------------------------ > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 > Reporter: xuzq > Assignee: xuzq > Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); > decompressor.setInput(buffer, 0, m); > lastBytesSent = m; > } else { > // looks like it's a concatenated stream: reset low-level zlib (or > // other engine) and buffers, then "resend" remaining input data > decompressor.reset(); > int leftoverOffset = lastBytesSent - nRemaining; > assert (leftoverOffset >= 0); > // this recopies userBuf -> direct buffer if using native libraries: > decompressor.setInput(buffer, leftoverOffset, nRemaining); > // NOTE: this is the one place we do NOT want to save the number > // of bytes sent (nRemaining here) into lastBytesSent: since we > // are resending what we've already sent before, offset is nonzero > // in general (only way it could be zero is if it already equals > // nRemaining), which would then screw up the offset calculation > // _next_ time around. IOW, getRemaining() is in terms of the > // original, zero-offset bufferload, so lastBytesSent must be as > // well. Cheesy ASCII art: > // > // <------------ m, lastBytesSent -----------> > // +===============================================+ > // buffer: |1111111111|22222222222222222|333333333333| | > // +===============================================+ > // #1: <-- off -->|<-------- nRemaining ---------> > // #2: <----------- off ----------->|<-- nRem. --> > // #3: (final substream: nRemaining == 0; eof = true) > // > // If lastBytesSent is anything other than m, as shown, then "off" > // will be calculated incorrectly. > } > }{code} -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org