[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=642714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642714 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 27/Aug/21 06:52 Start Date: 27/Aug/21 06:52 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r697196205 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; Review comment: @sunchao I found the root cause is here... If in one JVM, e.g. two tasks running in same node, there will be multiple `BuiltInGzipCompressor` instances. If we use static values for the HEADER/TRAILER, the crc value will be probably overwritten by other `BuiltInGzipCompressor` instances... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 642714) Time Spent: 25h 20m (was: 25h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 25h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=639484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-639484 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 18/Aug/21 17:02 Start Date: 18/Aug/21 17:02 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898777110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 639484) Time Spent: 25h 10m (was: 25h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 25h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638871 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 17/Aug/21 20:26 Start Date: 17/Aug/21 20:26 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898777110 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638871) Time Spent: 25h (was: 24h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 25h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638306 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 16/Aug/21 17:12 Start Date: 16/Aug/21 17:12 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-899675356 Thank you @sunchao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638306) Time Spent: 24h 50m (was: 24h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 24h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638304 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 16/Aug/21 17:08 Start Date: 16/Aug/21 17:08 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-899672814 Test failures unrelated. Merged to trunk. Thanks @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638304) Time Spent: 24h 40m (was: 24.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 24h 40m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638303 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 16/Aug/21 17:08 Start Date: 16/Aug/21 17:08 Worklog Time Spent: 10m Work Description: sunchao merged pull request #3250: URL: https://github.com/apache/hadoop/pull/3250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 638303) Time Spent: 24.5h (was: 24h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 24.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638302 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 16/Aug/21 17:07 Start Date: 16/Aug/21 17:07 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898179005 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 42s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 0s | | trunk passed | | +1 :green_heart: | compile | 21m 12s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 38s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 35s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 2s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 55s | | the patch passed | | +1 :green_heart: | compile | 20m 22s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 22s | | the patch passed | | +1 :green_heart: | compile | 18m 32s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 32s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 9s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 33s | | the patch passed | | +1 :green_heart: | javadoc | 1m 9s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 43s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 33s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 0s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 2s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 178m 9s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux a1ce48e1d2df 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4112e047030ac8318ae5aee0bf3c5d0d104d6c1e | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/testReport/ | | Max. process+thread count | 1267 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637926 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 14/Aug/21 03:47 Start Date: 14/Aug/21 03:47 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898811923 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 41s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 25s | | trunk passed | | +1 :green_heart: | compile | 21m 18s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 32s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 9s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 37s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 26s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 50s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 56s | | the patch passed | | +1 :green_heart: | compile | 20m 31s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 31s | | the patch passed | | +1 :green_heart: | compile | 18m 25s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 8s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/29/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 35s | | the patch passed | | +1 :green_heart: | javadoc | 1m 8s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 40s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 33s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 51s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 6s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 178m 16s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/29/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux c10d59b16d4a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 92d0671a3368f916bf670c9891143c94098a2c1c | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/29/testReport/ | | Max. process+thread count | 3152 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637915 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 14/Aug/21 01:05 Start Date: 14/Aug/21 01:05 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898785995 Thank you @sunchao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637915) Time Spent: 24h (was: 23h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 24h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637911 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 14/Aug/21 00:50 Start Date: 14/Aug/21 00:50 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898783160 > @sunchao They are from testBZip2NativeCodec and testCodecPoolCompressorReinit. They are original code which this change doesn't touch. Do we want to change it here? It should be fine then. I just triggered CI again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637911) Time Spent: 23h 50m (was: 23h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 23h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637907 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 14/Aug/21 00:17 Start Date: 14/Aug/21 00:17 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898778379 Looks like unrelated failure. ``` [ERROR] Tests run: 18, Failures: 0, Errors: 12, Skipped: 0, Time elapsed: 2.128 s <<< FAILURE! - in org.apache.hadoop.metrics2.source.TestJvmMetrics [ERROR] testGetMetricsPerf(org.apache.hadoop.metrics2.source.TestJvmMetrics) Time elapsed: 0.841 s <<< ERROR! java.lang.OutOfMemoryError: unable to create new native thread ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637907) Time Spent: 23h 40m (was: 23.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 23h 40m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637906 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 14/Aug/21 00:10 Start Date: 14/Aug/21 00:10 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898777110 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 10s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 11s | | trunk passed | | +1 :green_heart: | compile | 22m 31s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 19m 26s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 12s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 40s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 46s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 31s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 48s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 56s | | the patch passed | | +1 :green_heart: | compile | 21m 49s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 21m 49s | | the patch passed | | +1 :green_heart: | compile | 19m 27s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 19m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 11s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/28/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 39s | | the patch passed | | +1 :green_heart: | javadoc | 1m 9s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 45s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 38s | | the patch passed | | +1 :green_heart: | shadedclient | 17m 25s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 17m 29s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/28/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 188m 8s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.metrics2.source.TestJvmMetrics | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/28/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 9d7f770fbc29 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 92d0671a3368f916bf670c9891143c94098a2c1c | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions |
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637900 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 14/Aug/21 00:00 Start Date: 14/Aug/21 00:00 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898773785 @sunchao They are from `testBZip2NativeCodec` and `testCodecPoolCompressorReinit`. They are original code which this change doesn't touch. Do we want to change it here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637900) Time Spent: 23h 20m (was: 23h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 23h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637898 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 23:28 Start Date: 13/Aug/21 23:28 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898767347 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 3s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 49s | | trunk passed | | +1 :green_heart: | compile | 22m 35s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 20m 45s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 11s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 42s | | trunk passed | | +1 :green_heart: | javadoc | 1m 12s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 17m 52s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 1s | | the patch passed | | +1 :green_heart: | compile | 24m 43s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 24m 43s | | the patch passed | | +1 :green_heart: | compile | 20m 33s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 20m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 4s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/27/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 5s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 43s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 40s | | the patch passed | | +1 :green_heart: | shadedclient | 17m 7s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 51s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 56s | | The patch does not generate ASF License warnings. | | | | 194m 11s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/27/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux af0b3e2b0f23 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 87ae6bb66646c1ac6fab8896e443eb0c54500308 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/27/testReport/ | | Max. process+thread count | 1266 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637887 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 20:49 Start Date: 13/Aug/21 20:49 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688773637 ## File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java ## @@ -882,26 +908,28 @@ private void testGzipCodecWrite(boolean useNative) throws IOException { BufferedWriter w = null; Compressor gzipCompressor = CodecPool.getCompressor(codec); -if (null != gzipCompressor) { - // If it gives us back a Compressor, we should be able to use this - // to write files we can then read back with Java's gzip tools. - OutputStream os = new CompressorStream(new FileOutputStream(fileName), - gzipCompressor); - w = new BufferedWriter(new OutputStreamWriter(os)); - w.write(msg); - w.close(); - CodecPool.returnCompressor(gzipCompressor); - - verifyGzipFile(fileName, msg); -} - -// Create a gzip text file via codec.getOutputStream(). Review comment: Oh, got it. Yea, removed it accidentally. Restoring it with multi-write. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637887) Time Spent: 23h (was: 22h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 23h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637885 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 20:43 Start Date: 13/Aug/21 20:43 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688770891 ## File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java ## @@ -882,26 +908,28 @@ private void testGzipCodecWrite(boolean useNative) throws IOException { BufferedWriter w = null; Compressor gzipCompressor = CodecPool.getCompressor(codec); -if (null != gzipCompressor) { - // If it gives us back a Compressor, we should be able to use this - // to write files we can then read back with Java's gzip tools. - OutputStream os = new CompressorStream(new FileOutputStream(fileName), - gzipCompressor); - w = new BufferedWriter(new OutputStreamWriter(os)); - w.write(msg); - w.close(); - CodecPool.returnCompressor(gzipCompressor); - - verifyGzipFile(fileName, msg); -} - -// Create a gzip text file via codec.getOutputStream(). Review comment: oh I mean the original test with comment "// Create a gzip text file via codec.getOutputStream()." I think we should change it to use multi-write too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637885) Time Spent: 22h 50m (was: 22h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 22h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637884 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 20:41 Start Date: 13/Aug/21 20:41 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897916404 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 58s | | trunk passed | | +1 :green_heart: | compile | 23m 51s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 20m 38s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 39s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 31s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 33s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 58s | | the patch passed | | +1 :green_heart: | compile | 21m 26s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 21m 26s | | the patch passed | | +1 :green_heart: | compile | 19m 14s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 19m 14s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 15s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 43s | | the patch passed | | +1 :green_heart: | javadoc | 1m 4s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 41s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 50s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 36s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 41s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 187m 47s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux e46a4b76434d 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d661abcbd46f7d907db31b1cd4557f9397430dab | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/testReport/ | | Max. process+thread count | 1263 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637880 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 20:03 Start Date: 13/Aug/21 20:03 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688751999 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && deflater.finished(); + } + + @Override + public boolean needsInput() { +return deflater.needsInput() && state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +if (finished()) { + throw new IOException("compress called on finished compressor"); +} + +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { Review comment: as we now only write the header once for all inputs, seems okay. Let me change it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637880) Time Spent: 22.5h (was: 22h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 22.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637876 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 19:36 Start Date: 13/Aug/21 19:36 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688739194 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && deflater.finished(); + } + + @Override + public boolean needsInput() { +return deflater.needsInput() && state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +if (finished()) { + throw new IOException("compress called on finished compressor"); +} + +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { Review comment: yea, sorry `state == BuiltInGzipDecompressor.GzipStateLabel.HEADER_BASIC` - it is a stronger guarantee than the current one, is it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637876) Time Spent: 22h 20m (was: 22h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 22h 20m > Remaining Estimate: 0h > > Currently, GzipCodec
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637875 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 19:32 Start Date: 13/Aug/21 19:32 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688737420 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && deflater.finished(); + } + + @Override + public boolean needsInput() { +return deflater.needsInput() && state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +if (finished()) { + throw new IOException("compress called on finished compressor"); +} + +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { Review comment: Let me keep it for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637875) Time Spent: 22h 10m (was: 22h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 22h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637874 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 19:32 Start Date: 13/Aug/21 19:32 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688737260 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && deflater.finished(); + } + + @Override + public boolean needsInput() { +return deflater.needsInput() && state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +if (finished()) { + throw new IOException("compress called on finished compressor"); +} + +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { Review comment: Or you mean `state == BuiltInGzipDecompressor.GzipStateLabel.HEADER_BASIC`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637874) Time Spent: 22h (was: 21h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 22h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637873 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 19:31 Start Date: 13/Aug/21 19:31 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688736893 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && deflater.finished(); + } + + @Override + public boolean needsInput() { +return deflater.needsInput() && state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +if (finished()) { + throw new IOException("compress called on finished compressor"); +} + +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { Review comment: Oh, I remember I saw this guard in other compressor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637873) Time Spent: 21h 50m (was: 21h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 21h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637872 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 19:30 Start Date: 13/Aug/21 19:30 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688736308 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && deflater.finished(); + } + + @Override + public boolean needsInput() { +return deflater.needsInput() && state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +if (finished()) { + throw new IOException("compress called on finished compressor"); +} + +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { + int outputHeaderSize = writeHeader(b, off, len); + numExtraBytesWritten += outputHeaderSize; + + compressedBytesWritten += outputHeaderSize; + + if (outputHeaderSize == len) { +return compressedBytesWritten; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // now compress it into b[] + int deflated = deflater.deflate(b, off, len); + + compressedBytesWritten += deflated; + off += deflated; + len -= deflated; + + // All current input are processed. And `finished` is called. Going to output trailer. + if (deflater.finished()) { +state = BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; +fillTrailer(); + } else { +return compressedBytesWritten; + } +} + +int outputTrailerSize = writeTrailer(b, off, len); Review comment: okay -- This is an automated message from the Apache Git Service. To respond
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637871 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 19:29 Start Date: 13/Aug/21 19:29 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688736051 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && deflater.finished(); + } + + @Override + public boolean needsInput() { +return deflater.needsInput() && state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +if (finished()) { + throw new IOException("compress called on finished compressor"); +} + +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { Review comment: This block writes the header. We only write the header when in `HEADER_BASIC` state. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637871) Time Spent: 21.5h (was: 21h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 21.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637838 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 16:36 Start Date: 13/Aug/21 16:36 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688637566 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && deflater.finished(); + } + + @Override + public boolean needsInput() { +return deflater.needsInput() && state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +if (finished()) { + throw new IOException("compress called on finished compressor"); +} + +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { Review comment: sorry to raise this again, but I think it's safe to use `state != BuiltInGzipDecompressor.GzipStateLabel.HEADER_BASIC` now? ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,261 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637828 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 16:21 Start Date: 13/Aug/21 16:21 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688633641 ## File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java ## @@ -882,26 +908,28 @@ private void testGzipCodecWrite(boolean useNative) throws IOException { BufferedWriter w = null; Compressor gzipCompressor = CodecPool.getCompressor(codec); -if (null != gzipCompressor) { - // If it gives us back a Compressor, we should be able to use this - // to write files we can then read back with Java's gzip tools. - OutputStream os = new CompressorStream(new FileOutputStream(fileName), - gzipCompressor); - w = new BufferedWriter(new OutputStreamWriter(os)); - w.write(msg); - w.close(); - CodecPool.returnCompressor(gzipCompressor); - - verifyGzipFile(fileName, msg); -} - -// Create a gzip text file via codec.getOutputStream(). Review comment: Hmm? you mean keeping original single write? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637828) Time Spent: 21h 10m (was: 21h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 21h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637675 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 04:15 Start Date: 13/Aug/21 04:15 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898179005 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 42s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 0s | | trunk passed | | +1 :green_heart: | compile | 21m 12s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 38s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 35s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 2s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 55s | | the patch passed | | +1 :green_heart: | compile | 20m 22s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 22s | | the patch passed | | +1 :green_heart: | compile | 18m 32s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 32s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 9s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 33s | | the patch passed | | +1 :green_heart: | javadoc | 1m 9s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 43s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 33s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 0s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 2s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 178m 9s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux a1ce48e1d2df 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4112e047030ac8318ae5aee0bf3c5d0d104d6c1e | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/testReport/ | | Max. process+thread count | 1267 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637616 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 01:16 Start Date: 13/Aug/21 01:16 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688181189 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: We should return `deflater.needsInput()` in most cases. But if we are writing the trailer, even it needs input, we should return false. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637616) Time Spent: 20h 50m (was: 20h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 20h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637614 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 01:14 Start Date: 13/Aug/21 01:14 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688180734 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: Rethinking about it and testing locally, I think it should be ```java deflater.needsInput() && state !=BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; ``` ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637604 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 00:53 Start Date: 13/Aug/21 00:53 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688174639 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: Yes you're right, however the usage pattern for this seems to be: ```java compressor.setInput(b, off, len); while (!compressor.needsInput()) { compress(); } ``` so `setInput` is always called first. Although, it might be safer to also add a check of `HEADER_BASIC` there too. In the code snippet you have above, seems it will return true when for the first time `compressor.setInput` is called (since the state is `HEADER_BASIC`) , and it will not call `compress` afterwards? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637604) Time Spent: 20.5h (was: 20h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 20.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637603 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 00:53 Start Date: 13/Aug/21 00:53 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688174639 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: Yes that's right. The usage pattern for this seems to be: ```java compressor.setInput(b, off, len); while (!compressor.needsInput()) { compress(); } ``` so `setInput` is always called first. Although, it might be safer to also add a check of `HEADER_BASIC` there too. In the code snippet you have above, seems it will return true when for the first time `compressor.setInput` is called (since the state is `HEADER_BASIC`) , and it will not call `compress` afterwards? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637603) Time Spent: 20h 20m (was: 20h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 20h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637596 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 00:10 Start Date: 13/Aug/21 00:10 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688162145 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: Hm, but `deflater.needsInput()` actually return true after we just create Deflater instance (not call `setInput` yet). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637596) Time Spent: 20h (was: 19h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 20h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637597 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 00:10 Start Date: 13/Aug/21 00:10 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688162145 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: Hm, but `deflater.needsInput()` actually returns true after we just create Deflater instance (not call `setInput` yet and the state is `HEADER_BASIC`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637597) Time Spent: 20h 10m (was: 20h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 20h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637594 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 00:06 Start Date: 13/Aug/21 00:06 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688160670 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: In `HEADER_BASIC` it'll return false since `deflater.needsInput` will return false (since it hasn't start processing the input yet). I think this logic is correct since we need it to go into `compress` method? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637594) Time Spent: 19h 50m (was: 19h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 19h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4,
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637592 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 00:03 Start Date: 13/Aug/21 00:03 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688160063 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: Seems to be: ```java if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { return deflater.needsInput(); } return state == BuiltInGzipDecompressor.GzipStateLabel.HEADER_BASIC; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637592) Time Spent: 19h 40m (was: 19.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 19h 40m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637591 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 13/Aug/21 00:01 Start Date: 13/Aug/21 00:01 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688159282 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: Then `needsInput` will return false for `HEADER_BASIC` state? Seems no correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637591) Time Spent: 19.5h (was: 19h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 19.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637590 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 23:57 Start Date: 12/Aug/21 23:57 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688158036 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; Review comment: lol yeah, we've come back from a circle :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637590) Time Spent: 19h 20m (was: 19h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 19h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637584 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 23:52 Start Date: 12/Aug/21 23:52 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688156120 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; Review comment: I remember I use `deflater.finished()` at the beginning to check if it is the timing to output the trailer. So seems it already outputs the trailer for all inputs... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637584) Time Spent: 19h 10m (was: 19h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 19h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637583 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 23:50 Start Date: 12/Aug/21 23:50 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688155475 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; Review comment: Oh, you're right. I may misunderstand it. Seems `finished` is not necessary, we can just check `deflater.finished()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637583) Time Spent: 19h (was: 18h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 19h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637575 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 22:56 Start Date: 12/Aug/21 22:56 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688136794 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; Review comment: > but just we output the trailer for each input Hmm really? when I tested it, it still writes trailer after all the inputs. My understanding is `deflater.finished()` only returns true when `finish` is called. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637575) Time Spent: 18h 50m (was: 18h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 18h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637571 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 22:45 Start Date: 12/Aug/21 22:45 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688132453 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; Review comment: Currently we only write the trailer after all inputs are compressed (the caller calls `finish()`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637571) Time Spent: 18h 40m (was: 18.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 18h 40m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637570 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 22:44 Start Date: 12/Aug/21 22:44 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688132212 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; Review comment: `finished` is used to make sure we write the trailer after `finish()` is called. In other words, if we remove it, it is okay (this is actually how I implemented it previously), but just we output the trailer for each input. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637570) Time Spent: 18.5h (was: 18h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 18.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637563 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 22:17 Start Date: 12/Aug/21 22:17 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894594938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637563) Time Spent: 18h 20m (was: 18h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 18h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637562 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 22:16 Start Date: 12/Aug/21 22:16 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r688118961 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{ + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private boolean finished; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + private int accuButLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { +init(conf); + } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return finished && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +// After we output the trailer for the current input, we can take another input. Review comment: This comment is outdated. Also I wonder if we can change it to something like: ```java if (!deflater.needsInput()) { return false; } // After we output the trailer for the current input, we can take another input. return state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM; ``` It seems strange that we'd need more input when state is `FINISHED`. ## File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java ## @@ -882,26 +908,28 @@ private void testGzipCodecWrite(boolean useNative) throws IOException { BufferedWriter w = null; Compressor gzipCompressor = CodecPool.getCompressor(codec); -if (null != gzipCompressor) { - // If it gives us back a Compressor, we should be able to use this - // to write files we can then read back with Java's gzip tools. - OutputStream os = new CompressorStream(new FileOutputStream(fileName), - gzipCompressor); - w = new BufferedWriter(new OutputStreamWriter(os)); - w.write(msg); - w.close(); - CodecPool.returnCompressor(gzipCompressor); - - verifyGzipFile(fileName, msg); -} - -// Create a gzip text file via codec.getOutputStream(). Review comment: maybe we should keep this test still. ## File path:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637524 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 19:40 Start Date: 12/Aug/21 19:40 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897916404 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 58s | | trunk passed | | +1 :green_heart: | compile | 23m 51s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 20m 38s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 39s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 31s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 33s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 58s | | the patch passed | | +1 :green_heart: | compile | 21m 26s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 21m 26s | | the patch passed | | +1 :green_heart: | compile | 19m 14s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 19m 14s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 15s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 43s | | the patch passed | | +1 :green_heart: | javadoc | 1m 4s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 41s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 50s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 36s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 41s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 187m 47s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux e46a4b76434d 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d661abcbd46f7d907db31b1cd4557f9397430dab | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/testReport/ | | Max. process+thread count | 1263 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637410 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 16:31 Start Date: 12/Aug/21 16:31 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897787276 Sure. Just re-triggered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637410) Time Spent: 17h 50m (was: 17h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 17h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637403 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 16:21 Start Date: 12/Aug/21 16:21 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897775006 @sunchao CI failed by the following issue again: ``` [ERROR] Failed to execute goal on project hadoop-client-integration-tests: Could not resolve dependencies for project org.apache.hadoop:hadoop-client-integration-tests:jar:3.4.0-SNAPSHOT: Failed to collect dependencies at javax.activation:activation:jar:1.1.1: Failed to read artifact descriptor for javax.activation:activation:jar:1.1.1: Could not transfer artifact javax.activation:activation:pom:1.1.1 from/to central (https://repo.maven.apache.org/maven2): Transfer failed for https://repo.maven.apache.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1.pom: Connection reset -> [Help 1] ``` Could you help re-trigger the CI? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637403) Time Spent: 17h 40m (was: 17.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 17h 40m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637303 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 12/Aug/21 10:49 Start Date: 12/Aug/21 10:49 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897537496 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 2s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | -1 :x: | mvninstall | 34m 13s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/24/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | +1 :green_heart: | compile | 23m 31s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 20m 54s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 7s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 38s | | trunk passed | | +1 :green_heart: | javadoc | 1m 11s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 46s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 33s | | trunk passed | | +1 :green_heart: | shadedclient | 18m 17s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 0s | | the patch passed | | +1 :green_heart: | compile | 23m 26s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 23m 26s | | the patch passed | | +1 :green_heart: | compile | 20m 50s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 20m 50s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 15s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/24/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 49s | | the patch passed | | +1 :green_heart: | javadoc | 1m 5s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 39s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 46s | | the patch passed | | +1 :green_heart: | shadedclient | 19m 20s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 50s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 55s | | The patch does not generate ASF License warnings. | | | | 198m 26s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/24/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 3fac6fb8a31e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d661abcbd46f7d907db31b1cd4557f9397430dab | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results |
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637141 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 11/Aug/21 21:14 Start Date: 11/Aug/21 21:14 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897158917 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 53s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 50s | | trunk passed | | +1 :green_heart: | compile | 22m 12s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 20m 3s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 5s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 35s | | trunk passed | | +1 :green_heart: | javadoc | 1m 5s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 37s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 26s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 58s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 0s | | the patch passed | | +1 :green_heart: | compile | 22m 27s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 22m 27s | | the patch passed | | +1 :green_heart: | compile | 19m 52s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 19m 52s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 8s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/23/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 332 unchanged - 0 fixed = 334 total (was 332) | | +1 :green_heart: | mvnsite | 1m 37s | | the patch passed | | +1 :green_heart: | javadoc | 1m 5s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 48s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 43s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 44s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 16s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 56s | | The patch does not generate ASF License warnings. | | | | 186m 54s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/23/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux e22538cca47a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 1a1df5e5726beaf90c3112e6a9e4c83ae695658a | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/23/testReport/ | | Max. process+thread count | 1262 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=636718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636718 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 11/Aug/21 03:55 Start Date: 11/Aug/21 03:55 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-896479273 @sunchao Please let me know if the new change looks good to you. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 636718) Time Spent: 17h 10m (was: 17h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 17h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=636714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636714 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 11/Aug/21 03:26 Start Date: 11/Aug/21 03:26 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-896471265 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 55s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 5s | | trunk passed | | +1 :green_heart: | compile | 21m 15s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 41s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 11s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 40s | | trunk passed | | +1 :green_heart: | javadoc | 1m 6s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 36s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 26s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 42s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 2s | | the patch passed | | +1 :green_heart: | compile | 22m 12s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 22m 12s | | the patch passed | | +1 :green_heart: | compile | 19m 55s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 19m 55s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 0s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 4s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 35s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 40s | | the patch passed | | +1 :green_heart: | shadedclient | 17m 3s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 35s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 183m 27s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/22/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 73f41b858049 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2e04fa60c3d5ae2373dc79f0037b7926a201cfc3 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/22/testReport/ | | Max. process+thread count | 1263 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/22/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635974=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635974 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 09/Aug/21 17:32 Start Date: 09/Aug/21 17:32 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893970892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 635974) Time Spent: 16h 50m (was: 16h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 16h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635936 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 09/Aug/21 16:29 Start Date: 09/Aug/21 16:29 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-895365571 Thanks @steveloughran. I also don't think there's risk here. The compressing here basically takes the buffers given by the caller, instead of allocating by itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 635936) Time Spent: 16h 40m (was: 16.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 16h 40m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635864 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 09/Aug/21 13:21 Start Date: 09/Aug/21 13:21 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-895217443 One concern here: anything we have to worry about from a security perspective? That is, if someone sends in something with an invalid range, does that trigger allocation of massive buffers, etc, etc. commons-compress has had security issues over time with things like .. in paths. I don't think there's risk here, but it's worth considering: do we have to worry about malicious gz files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 635864) Time Spent: 16.5h (was: 16h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 16.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635540 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 07/Aug/21 02:54 Start Date: 07/Aug/21 02:54 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894594938 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 30m 48s | | trunk passed | | +1 :green_heart: | compile | 21m 15s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 25s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 13s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 36s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 41s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 46s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 56s | | the patch passed | | +1 :green_heart: | compile | 20m 57s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 57s | | the patch passed | | +1 :green_heart: | compile | 19m 36s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 19m 36s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 5s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 36s | | the patch passed | | +1 :green_heart: | javadoc | 1m 8s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 37s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 38s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 44s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 36s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 53s | | The patch does not generate ASF License warnings. | | | | 179m 47s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/21/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 1a509ee1f77e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / e0bec4a7b2a5df7a1148e8b1023893dd6ff50ec6 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/21/testReport/ | | Max. process+thread count | 3150 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/21/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635513 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 23:34 Start Date: 06/Aug/21 23:34 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894566377 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 56s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 4s | | trunk passed | | +1 :green_heart: | compile | 21m 53s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 31s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 37s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 39s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 27s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 50s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 56s | | the patch passed | | +1 :green_heart: | compile | 20m 28s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 28s | | the patch passed | | +1 :green_heart: | compile | 18m 39s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 39s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 8s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 36s | | the patch passed | | +1 :green_heart: | javadoc | 1m 6s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 42s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 36s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 6s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 20s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 180m 30s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/20/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 5f58db499a75 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 939b349d899203203998e0ed6ba8125366ec0a5a | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/20/testReport/ | | Max. process+thread count | 1262 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/20/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635477 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 20:22 Start Date: 06/Aug/21 20:22 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894499414 > @viirya there are a few style issues in [here](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) - could you fix them? Oh, ok. Let me fix them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 635477) Time Spent: 16h (was: 15h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 16h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635476 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 20:22 Start Date: 06/Aug/21 20:22 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r684484896 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return false; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { + int outputHeaderSize = writeHeader(b, off, len); + numExtraBytesWritten += outputHeaderSize; + + compressedBytesWritten += outputHeaderSize; + + if (outputHeaderSize == len) { +return compressedBytesWritten; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // now compress it into b[] + int deflated = deflater.deflate(b, off, len); + + compressedBytesWritten += deflated; + off += deflated; + len -= deflated; + + // All current input are processed. Going to output trailer. + if (deflater.finished()) { +state = BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; +fillTrailer(); + } else { +return compressedBytesWritten; + } +} + +int outputTrailerSize = writeTrailer(b, off, len); +numExtraBytesWritten += outputTrailerSize; + +compressedBytesWritten += outputTrailerSize; + +return compressedBytesWritten; + } + + @Override + public long getBytesRead() { +return deflater.getTotalIn(); + } + +
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635474 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 20:18 Start Date: 06/Aug/21 20:18 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r684483080 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return false; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { + int outputHeaderSize = writeHeader(b, off, len); + numExtraBytesWritten += outputHeaderSize; + + compressedBytesWritten += outputHeaderSize; + + if (outputHeaderSize == len) { +return compressedBytesWritten; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // now compress it into b[] + int deflated = deflater.deflate(b, off, len); + + compressedBytesWritten += deflated; + off += deflated; + len -= deflated; + + // All current input are processed. Going to output trailer. + if (deflater.finished()) { +state = BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; +fillTrailer(); + } else { +return compressedBytesWritten; + } +} + +int outputTrailerSize = writeTrailer(b, off, len); +numExtraBytesWritten += outputTrailerSize; + +compressedBytesWritten += outputTrailerSize; + +return compressedBytesWritten; + } + + @Override + public long getBytesRead() { +return deflater.getTotalIn(); + } + +
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635434 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 18:24 Start Date: 06/Aug/21 18:24 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894439227 @viirya there are a few style issues in [here](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) - could you fix them? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 635434) Time Spent: 15.5h (was: 15h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 15.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635432 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 18:22 Start Date: 06/Aug/21 18:22 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r684424851 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private static final byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private static final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numExtraBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return false; + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +if (currentBufLen <= 0) { + return compressedBytesWritten; +} + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && +state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) { + int outputHeaderSize = writeHeader(b, off, len); + numExtraBytesWritten += outputHeaderSize; + + compressedBytesWritten += outputHeaderSize; + + if (outputHeaderSize == len) { +return compressedBytesWritten; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // now compress it into b[] + int deflated = deflater.deflate(b, off, len); + + compressedBytesWritten += deflated; + off += deflated; + len -= deflated; + + // All current input are processed. Going to output trailer. + if (deflater.finished()) { +state = BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC; +fillTrailer(); + } else { +return compressedBytesWritten; + } +} + +int outputTrailerSize = writeTrailer(b, off, len); +numExtraBytesWritten += outputTrailerSize; + +compressedBytesWritten += outputTrailerSize; + +return compressedBytesWritten; + } + + @Override + public long getBytesRead() { +return deflater.getTotalIn(); + } + +
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635423 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 18:09 Start Date: 06/Aug/21 18:09 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892989941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 635423) Time Spent: 15h 10m (was: 15h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 15h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635420 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 18:08 Start Date: 06/Aug/21 18:08 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892363546 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 1s | | trunk passed | | +1 :green_heart: | compile | 21m 13s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 34s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 9s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 33s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 54s | | the patch passed | | +1 :green_heart: | compile | 20m 34s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 34s | | the patch passed | | +1 :green_heart: | compile | 18m 28s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 8s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/12/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 15 new + 332 unchanged - 0 fixed = 347 total (was 332) | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 44s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 34s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 56s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 3s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 177m 32s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/12/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 205b08ba081f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 0a1bb194f29612863d8ad31971a737b30be4d982 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/12/testReport/ | | Max. process+thread count | 1267 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635419 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 18:08 Start Date: 06/Aug/21 18:08 Worklog Time Spent: 10m Work Description: hadoop-yetus removed a comment on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-891453787 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 47s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 1s | | trunk passed | | +1 :green_heart: | compile | 21m 8s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 29s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 11s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 38s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 26s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 43s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 57s | | the patch passed | | +1 :green_heart: | compile | 20m 28s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 28s | | the patch passed | | +1 :green_heart: | compile | 18m 29s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 29s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 10s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/8/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 156 new + 332 unchanged - 0 fixed = 488 total (was 332) | | +1 :green_heart: | mvnsite | 1m 35s | | the patch passed | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 37s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 35s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 18s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 39s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 55s | | The patch does not generate ASF License warnings. | | | | 178m 53s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 0f0b4b015c68 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 49619e3f1ebdc62c89bcb74fd5cbf75a80a0601c | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/8/testReport/ | | Max. process+thread count | 2952 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634936 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 03:16 Start Date: 06/Aug/21 03:16 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893970892 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 40s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 30m 58s | | trunk passed | | +1 :green_heart: | compile | 21m 8s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 30s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 9s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 36s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 39s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 3s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 55s | | the patch passed | | +1 :green_heart: | compile | 22m 22s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 22m 22s | | the patch passed | | +1 :green_heart: | compile | 20m 48s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 20m 48s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 9s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 7 new + 332 unchanged - 0 fixed = 339 total (was 332) | | +1 :green_heart: | mvnsite | 1m 29s | | the patch passed | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 35s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 43s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 51s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 34s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 54s | | The patch does not generate ASF License warnings. | | | | 182m 50s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 05889202c053 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d5ac82e295d0dacc890c74786d15758c4b8e51e3 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/testReport/ | | Max. process+thread count | 3149 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634933 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 02:52 Start Date: 06/Aug/21 02:52 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683910228 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: I think calling finish will set a flag to tell it flush the data when finished. Yea the current approach also looks good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634933) Time Spent: 14.5h (was: 14h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 14.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634926 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 02:29 Start Date: 06/Aug/21 02:29 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683902983 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Hmm, I'm not sure. :) I looked at `finish()` at `Deflater`. It just sets a `finish` variable to true. But the variable is not used at all. So technically, I guess you still can call its `setInput` to set new input and `deflate` again? Because of that, I take more conservative approach here in case of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634926) Time Spent: 14h 20m (was: 14h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 14h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634924 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 02:17 Start Date: 06/Aug/21 02:17 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683899349 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: I see. Yea I meant `CompressorStream` calls its own `compress` method which calls `compressor.compress` indirectly. > Calling finish() on this compressor won't set state to FINISHED. Oh sorry I was looking at an old version of this PR which still set it to `FINISHED` in `finish()`. Never mind. So this looks good then :) Although I think the following: > What I thought is, the caller might set input and compress until it doesn't need input. The state is in FINISHED and the caller might call set input and compress again. At the moment this check isn't effective to write the header. Will also not happen? since when the state is finished, the caller should not call `setInput` before it calls `reset` on the compressor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634924) Time Spent: 14h 10m (was: 14h) > Add BuiltInGzipCompressor >
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634919 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 01:54 Start Date: 06/Aug/21 01:54 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683892398 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Oh, I mean `Compressor` doesn't have a `compress()` method. Once calling `compress`, the caller must provide a buffer for compressed output. Calling `finish()` on this compressor won't set state to `FINISHED`. Only if we output the trailer, we set the state to `FINISHED`. In this state, the caller can set new input and call `compress(buf, off, len)` again. We will output new header and enter into another compressed output (section). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634919) Time Spent: 14h (was: 13h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available >
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634918 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 01:49 Start Date: 06/Aug/21 01:49 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683890692 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: > So seems we cannot do like CompressorStream. Hmm what do you mean here? sorry don't quite understand it. > Once the caller calls finish on this Compressor, we only call finish on the deflator. The caller then will call finished to verify if it reaches finished state. If not, it should call compress with buffer to get more compressed output. Yes. So if the input is large but the buffer in a `CompressorStream` is small, potentially it will need to call `compress` multiple times after the `finish()` is invoked, before it can reach to the `finished` status (i.e., `deflater.finished()` returns true). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634918) Time Spent: 13h 50m (was: 13h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 >
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634913 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 01:37 Start Date: 06/Aug/21 01:37 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683887010 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: For `Compressor` here, `compress` requires a buffer to write compressed output to. So seems we cannot do like `CompressorStream`. Once the caller calls `finish` on this `Compressor`, we only call `finish` on the deflator`. The caller then will call `finished` to verify if it reaches finished state. If not, it should call `compress` with buffer to get more compressed output. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634913) Time Spent: 13h 40m (was: 13.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 13h 40m > Remaining Estimate: 0h > >
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634899 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 00:42 Start Date: 06/Aug/21 00:42 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683870968 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Ah OK, so the compression can happen on multiple inputs. Then, I'm curious whether we should handle the `FINISHED` state too in this `if` clause. For instance, in the following situation: ```java @Override public void finish() throws IOException { if (!compressor.finished()) { compressor.finish(); while (!compressor.finished()) { compress(); } } } ``` the `CompressorStream` will first set the state to be `FINISHED` and then keep calling `compress` until it is finished. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634899) Time Spent: 13.5h (was: 13h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634887 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 00:11 Start Date: 06/Aug/21 00:11 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683862198 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Let me revert to original condition `INFLATE_STREAM` and `TRAILER_CRC`. It looks more reliable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634887) Time Spent: 13h 20m (was: 13h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 13h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634877 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 00:03 Start Date: 06/Aug/21 00:03 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683859384 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Hmm, oh, no, I think we cannot do it. `setInput` can be called multiple times before we reach `FINISHED` status. If we set the state to `HEADER_BASIC`, it will re-output the header, but the current trailer is not output yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634877) Time Spent: 13h 10m (was: 13h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 13h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634875 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 06/Aug/21 00:02 Start Date: 06/Aug/21 00:02 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683859384 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Hmm, oh, no, I think we cannot do it. `setInput` can be called multiple times before we reach finished status. If we set the state to `HEADER_BASIC`, it will re-output the header, but the current trailer is not output yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634875) Time Spent: 13h (was: 12h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 13h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634858 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 23:07 Start Date: 05/Aug/21 23:07 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683840945 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Seems so. Let me update it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634858) Time Spent: 12h 50m (was: 12h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 12h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634855 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 22:58 Start Date: 05/Aug/21 22:58 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893873105 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 41s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 1s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 30m 50s | | trunk passed | | +1 :green_heart: | compile | 21m 15s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 34s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 36s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 56s | | the patch passed | | +1 :green_heart: | compile | 20m 29s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 29s | | the patch passed | | +1 :green_heart: | compile | 18m 36s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 36s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 8s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/18/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 7 new + 332 unchanged - 0 fixed = 339 total (was 332) | | +1 :green_heart: | mvnsite | 1m 35s | | the patch passed | | +1 :green_heart: | javadoc | 1m 8s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 38s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 33s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 58s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 13s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 178m 1s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/18/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 53a5467acfb9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 204709d1bc0ebb41521219367c39d1badc698ab1 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/18/testReport/ | | Max. process+thread count | 1266 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634831 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 22:04 Start Date: 05/Aug/21 22:04 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683816220 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Hmm you are right. Should we change the state to `HEADER_BASIC` in `setInput`? it seems we should do so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634831) Time Spent: 12.5h (was: 12h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 12.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634786 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 20:08 Start Date: 05/Aug/21 20:08 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683753560 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: What I thought is, the caller might set input and compress until it doesn't need input. The state is in `FINISHED` and the caller might call set input and compress again. At the moment this check isn't effective to write the header. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634786) Time Spent: 12h 20m (was: 12h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 12h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile >
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634779 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 20:01 Start Date: 05/Aug/21 20:01 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683749490 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Hmm, okay. I set it to `HEADER_BASIC` and see if CI can pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634779) Time Spent: 12h 10m (was: 12h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 12h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634763 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 19:31 Start Date: 05/Aug/21 19:31 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893728072 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 9s | | trunk passed | | +1 :green_heart: | compile | 21m 24s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 42s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 8s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 38s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 24s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 52s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 56s | | the patch passed | | +1 :green_heart: | compile | 20m 36s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 36s | | the patch passed | | +1 :green_heart: | compile | 18m 34s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 34s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 9s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/17/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 7 new + 332 unchanged - 0 fixed = 339 total (was 332) | | +1 :green_heart: | mvnsite | 1m 35s | | the patch passed | | +1 :green_heart: | javadoc | 1m 8s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 44s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 34s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 52s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 17s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 178m 55s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/17/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux e238c79df9e4 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f1db328eb900b4ffe96f4150ecb4359d389c67de | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/17/testReport/ | | Max. process+thread count | 3158 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634658 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 16:31 Start Date: 05/Aug/21 16:31 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893597879 Re-triggered the CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634658) Time Spent: 11h 50m (was: 11h 40m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 11h 50m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634656 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 16:30 Start Date: 05/Aug/21 16:30 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683610965 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: I think `compress` is always used in loops like this: ```java while (!compresser.needsInput()) { compresser.compress(..) } ``` so if the state transitioned to `FINISHED`, we'd come out of the loop and ask for more input to compress. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634656) Time Spent: 11h 40m (was: 11.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 11h 40m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed,
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634652 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 16:25 Start Date: 05/Aug/21 16:25 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893593265 @sunchao Could you trigger the CI again? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634652) Time Spent: 11.5h (was: 11h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 11.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634650 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 16:24 Start Date: 05/Aug/21 16:24 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893592758 Last CI failure looks unrelated: ``` [ERROR] Failed to execute goal on project hadoop-yarn-common: Could not resolve dependencies for project org.apache.hadoop:hadoop-yarn-common:jar:3.4.0-SNAPSHOT: Failed to collect dependencies at com.sun.jersey:jersey-client:jar:1.19: Failed to read artifact descriptor for com.sun.jersey:jersey-client:jar:1.19: Could not transfer artifact com.sun.jersey:jersey-client:pom:1.19 from/to central (https://repo.maven.apache.org/maven2): Transfer failed for https://repo.maven.apache.org/maven2/com/sun/jersey/jersey-client/1.19/jersey-client-1.19.pom: Connection reset -> [Help 1] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634650) Time Spent: 11h 20m (was: 11h 10m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 11h 20m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634498 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 11:59 Start Date: 05/Aug/21 11:59 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893374924 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | -1 :x: | mvninstall | 9m 54s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/16/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | +1 :green_heart: | compile | 26m 33s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 45s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 11s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 37s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 37s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 33s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 5s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 59s | | the patch passed | | +1 :green_heart: | compile | 21m 18s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 21m 18s | | the patch passed | | +1 :green_heart: | compile | 19m 3s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 19m 3s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 9s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/16/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 7 new + 332 unchanged - 0 fixed = 339 total (was 332) | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 8s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 36s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 32s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 5s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 163m 52s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/16/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 0f0bd9b33214 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f1db328eb900b4ffe96f4150ecb4359d389c67de | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results |
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634363 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 11:42 Start Date: 05/Aug/21 11:42 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r682782504 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = 10; Review comment: nit: maybe better do `GZIP_HEADER_LEN = GZIP_HEADER.length`. ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = 10; + private final int GZIP_TRAILER_LEN = 8; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private byte[] userBuf = null; + private int userBufOff = 0; + private int userBufLen = 0; + + private int headerBytesWritten = 0; + private int trailerBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634303 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 11:36 Start Date: 05/Aug/21 11:36 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r682842698 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = 10; + private final int GZIP_TRAILER_LEN = 8; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private byte[] userBuf = null; + private int userBufOff = 0; + private int userBufLen = 0; + + private int headerBytesWritten = 0; + private int trailerBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int numAvailBytes = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + if (userBufLen <= 0) { +return numAvailBytes; + } + + int outputHeaderSize = writeHeader(b, off, len); + headerBytesWritten += outputHeaderSize; + + // Completes header output. + if (headerOff == GZIP_HEADER_LEN) { +state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM; + } + + numAvailBytes += outputHeaderSize; + + if (outputHeaderSize == len) { +return numAvailBytes; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // hand off user data (or what's left of it) to Deflater--but note that + // Deflater may not have consumed all of previous bufferload, in which case + // userBufLen will be zero + if (userBufLen > 0) { +deflater.setInput(userBuf, userBufOff, userBufLen); + +crc.update(userBuf, userBufOff, userBufLen); // CRC-32 is on uncompressed data + +currentBufLen = userBufLen; +userBufOff += userBufLen; +userBufLen = 0; + } + + + // now compress it into b[] + int deflated = deflater.deflate(b, off, len); + + numAvailBytes += deflated; + off +=
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634117 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 11:15 Start Date: 05/Aug/21 11:15 Worklog Time Spent: 10m Work Description: viirya commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892860619 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634117) Time Spent: 10h 40m (was: 10.5h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 10h 40m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634095 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 11:11 Start Date: 05/Aug/21 11:11 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892989941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 634095) Time Spent: 10.5h (was: 10h 20m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 10.5h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we migrated to using prepared packages (lz4, > snappy), it will be better if we support GzipCodec generally without Hadoop > native codec installed. Similar to BuiltInGzipDecompressor, we can use Java > Deflater to support BuiltInGzipCompressor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634059 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 11:00 Start Date: 05/Aug/21 11:00 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893365445 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 2s | | trunk passed | | +1 :green_heart: | compile | 21m 26s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 34s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 7s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 36s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 41s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 54s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 54s | | the patch passed | | +1 :green_heart: | compile | 20m 27s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 27s | | the patch passed | | +1 :green_heart: | compile | 18m 33s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 8s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/15/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 7 new + 332 unchanged - 0 fixed = 339 total (was 332) | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 39s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 34s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 1s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 5s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 178m 12s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/15/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 8d5a0ff00386 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d7f052a65c680122d111e15428139a5e2fdf43e2 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/15/testReport/ | | Max. process+thread count | 1267 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633979 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 07:52 Start Date: 05/Aug/21 07:52 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683215249 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Hmm, if we change to `HEADER_BASIC` here, we may also need to check if the state is `FINISHED`? Otherwise after we output the trailer, we cannot call compress again to compress on another input. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 633979) Time Spent: 10h 10m (was: 10h) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 10h 10m > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633977 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 07:51 Start Date: 05/Aug/21 07:51 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683215249 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = GZIP_HEADER.length; + private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private int numBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int compressedBytesWritten = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM && Review comment: Hmm, if we change to `HEADER_BASIC` here, we may also need to check if the state is `FINISHED`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 633977) Time Spent: 10h (was: 9h 50m) > Add BuiltInGzipCompressor > - > > Key: HADOOP-17825 > URL: https://issues.apache.org/jira/browse/HADOOP-17825 > Project: Hadoop Common > Issue Type: Improvement >Reporter: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Time Spent: 10h > Remaining Estimate: 0h > > Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is > not loaded. So, without Hadoop native codec installed, saving SequenceFile > using GzipCodec will throw exception like "SequenceFile doesn't work with > GzipCodec without native-hadoop code!" > Same as other codecs which we
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633959 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 05/Aug/21 06:20 Start Date: 05/Aug/21 06:20 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r683008770 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; Review comment: nit: this could be `private static final` ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java ## @@ -1180,14 +1180,6 @@ public static Option syncInterval(int value) { new Metadata() : metadataOption.getValue(); this.compress = compressionTypeOption.getValue(); final CompressionCodec codec = compressionTypeOption.getCodec(); - if (codec != null && Review comment: there are a few unused imports in this file. ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,251 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + // The trailer will be overwritten based on crc and output size. + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633854 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 04/Aug/21 21:54 Start Date: 04/Aug/21 21:54 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893000270 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 53s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 14s | | trunk passed | | +1 :green_heart: | compile | 21m 18s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 25s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 36s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 40s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 28s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 49s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 54s | | the patch passed | | +1 :green_heart: | compile | 20m 27s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 27s | | the patch passed | | +1 :green_heart: | compile | 18m 24s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 24s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 10s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/14/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 12 new + 332 unchanged - 0 fixed = 344 total (was 332) | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 9s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 40s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 35s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 5s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 1s | | The patch does not generate ASF License warnings. | | | | 178m 6s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/14/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux dc221d00cbe1 4.15.0-151-generic #157-Ubuntu SMP Fri Jul 9 23:07:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6b823c2a3df28393c89a954ecc0e3ad34c49c3ed | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/14/testReport/ | | Max. process+thread count | 3153 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633843 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 04/Aug/21 21:33 Start Date: 04/Aug/21 21:33 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892989941 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 40s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 47s | | trunk passed | | +1 :green_heart: | compile | 22m 29s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 18m 36s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 9s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 37s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 27s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 53s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 55s | | the patch passed | | +1 :green_heart: | compile | 20m 26s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 20m 26s | | the patch passed | | +1 :green_heart: | compile | 18m 32s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 32s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 9s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/13/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 14 new + 332 unchanged - 0 fixed = 346 total (was 332) | | +1 :green_heart: | mvnsite | 1m 33s | | the patch passed | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 42s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 33s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 50s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 7s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 180m 5s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/13/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3250 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 1a928499ddd8 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 8b752957cb7d7612dc1849244b6064cdd854e20b | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/13/testReport/ | | Max. process+thread count | 2226 (vs. ulimit of 5500) | | modules | C:
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633762 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 04/Aug/21 18:54 Start Date: 04/Aug/21 18:54 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r682876971 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = 10; + private final int GZIP_TRAILER_LEN = 8; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private byte[] userBuf = null; + private int userBufOff = 0; + private int userBufLen = 0; + + private int headerBytesWritten = 0; + private int trailerBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int numAvailBytes = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + if (userBufLen <= 0) { +return numAvailBytes; + } + + int outputHeaderSize = writeHeader(b, off, len); + headerBytesWritten += outputHeaderSize; + + // Completes header output. + if (headerOff == GZIP_HEADER_LEN) { +state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM; + } + + numAvailBytes += outputHeaderSize; + + if (outputHeaderSize == len) { +return numAvailBytes; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // hand off user data (or what's left of it) to Deflater--but note that + // Deflater may not have consumed all of previous bufferload, in which case + // userBufLen will be zero + if (userBufLen > 0) { +deflater.setInput(userBuf, userBufOff, userBufLen); + +crc.update(userBuf, userBufOff, userBufLen); // CRC-32 is on uncompressed data + +currentBufLen = userBufLen; +userBufOff += userBufLen; +userBufLen = 0; + } + + + // now compress it into b[] + int deflated = deflater.deflate(b, off, len); + + numAvailBytes += deflated; + off +=
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633761 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 04/Aug/21 18:53 Start Date: 04/Aug/21 18:53 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r682876207 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = 10; + private final int GZIP_TRAILER_LEN = 8; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private byte[] userBuf = null; + private int userBufOff = 0; + private int userBufLen = 0; + + private int headerBytesWritten = 0; + private int trailerBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int numAvailBytes = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + if (userBufLen <= 0) { +return numAvailBytes; + } + + int outputHeaderSize = writeHeader(b, off, len); + headerBytesWritten += outputHeaderSize; + + // Completes header output. + if (headerOff == GZIP_HEADER_LEN) { +state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM; + } + + numAvailBytes += outputHeaderSize; + + if (outputHeaderSize == len) { +return numAvailBytes; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // hand off user data (or what's left of it) to Deflater--but note that + // Deflater may not have consumed all of previous bufferload, in which case + // userBufLen will be zero + if (userBufLen > 0) { Review comment: Oh, let me see. Yea, seems working. The logic I used is from decompressor. For decompressor, it needs to parse the header from user input before the inflater can consume user input. So it cannot directly set the input to inflater. But for compressor case, we don't need it and can directly set the input buffer to the deflater at `setInput`. -- This is an automated message from
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633752 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 04/Aug/21 18:45 Start Date: 04/Aug/21 18:45 Worklog Time Spent: 10m Work Description: viirya commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r682870963 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = 10; + private final int GZIP_TRAILER_LEN = 8; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private byte[] userBuf = null; + private int userBufOff = 0; + private int userBufLen = 0; + + private int headerBytesWritten = 0; + private int trailerBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int numAvailBytes = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + if (userBufLen <= 0) { +return numAvailBytes; + } + + int outputHeaderSize = writeHeader(b, off, len); + headerBytesWritten += outputHeaderSize; + + // Completes header output. + if (headerOff == GZIP_HEADER_LEN) { +state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM; + } + + numAvailBytes += outputHeaderSize; + + if (outputHeaderSize == len) { +return numAvailBytes; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // hand off user data (or what's left of it) to Deflater--but note that + // Deflater may not have consumed all of previous bufferload, in which case + // userBufLen will be zero + if (userBufLen > 0) { +deflater.setInput(userBuf, userBufOff, userBufLen); + +crc.update(userBuf, userBufOff, userBufLen); // CRC-32 is on uncompressed data + +currentBufLen = userBufLen; +userBufOff += userBufLen; +userBufLen = 0; + } + + + // now compress it into b[] + int deflated = deflater.deflate(b, off, len); + + numAvailBytes += deflated; + off +=
[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor
[ https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633745 ] ASF GitHub Bot logged work on HADOOP-17825: --- Author: ASF GitHub Bot Created on: 04/Aug/21 18:41 Start Date: 04/Aug/21 18:41 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #3250: URL: https://github.com/apache/hadoop/pull/3250#discussion_r682868747 ## File path: hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.io.compress.zlib; + +import java.io.IOException; +import java.util.zip.Checksum; +import java.util.zip.Deflater; +import java.util.zip.GZIPOutputStream; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.io.compress.Compressor; +import org.apache.hadoop.io.compress.DoNotPool; +import org.apache.hadoop.util.DataChecksum; + +/** + * A {@link Compressor} based on the popular gzip compressed file format. + * http://www.gzip.org/ + */ +@DoNotPool +public class BuiltInGzipCompressor implements Compressor { + + /** + * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for + * details. + */ + private static final byte[] GZIP_HEADER = new byte[]{ + 0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}; + + private final int GZIP_HEADER_LEN = 10; + private final int GZIP_TRAILER_LEN = 8; + + private Deflater deflater; + + private int headerOff = 0; + private int trailerOff = 0; + + private byte[] userBuf = null; + private int userBufOff = 0; + private int userBufLen = 0; + + private int headerBytesWritten = 0; + private int trailerBytesWritten = 0; + + private int currentBufLen = 0; + + private final Checksum crc = DataChecksum.newCrc32(); + + private BuiltInGzipDecompressor.GzipStateLabel state; + + public BuiltInGzipCompressor(Configuration conf) { init(conf); } + + @Override + public boolean finished() { +// Only if the trailer is also written, it is thought as finished. +return deflater.finished() && state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED; + } + + @Override + public boolean needsInput() { +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + return deflater.needsInput(); +} + +return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED); + } + + @Override + public int compress(byte[] b, int off, int len) throws IOException { +int numAvailBytes = 0; + +// If we are not within uncompressed data yet, output the header. +if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + if (userBufLen <= 0) { +return numAvailBytes; + } + + int outputHeaderSize = writeHeader(b, off, len); + headerBytesWritten += outputHeaderSize; + + // Completes header output. + if (headerOff == GZIP_HEADER_LEN) { +state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM; + } + + numAvailBytes += outputHeaderSize; + + if (outputHeaderSize == len) { +return numAvailBytes; + } + + off += outputHeaderSize; + len -= outputHeaderSize; +} + +if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) { + // hand off user data (or what's left of it) to Deflater--but note that + // Deflater may not have consumed all of previous bufferload, in which case + // userBufLen will be zero + if (userBufLen > 0) { +deflater.setInput(userBuf, userBufOff, userBufLen); + +crc.update(userBuf, userBufOff, userBufLen); // CRC-32 is on uncompressed data + +currentBufLen = userBufLen; +userBufOff += userBufLen; +userBufLen = 0; + } + + + // now compress it into b[] + int deflated = deflater.deflate(b, off, len); + + numAvailBytes += deflated; + off +=