[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392748=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392748 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 18:08 Start Date: 25/Feb/20 18:08 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392748) Time Spent: 17h 10m (was: 17h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 17h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392678 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 16:51 Start Date: 25/Feb/20 16:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r383992952 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompression.java ## @@ -22,58 +22,52 @@ import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; -import org.apache.hadoop.io.compress.CompressionInputStream; -import org.apache.hadoop.io.compress.CompressionOutputStream; public class LzoCompression { /** - * Create a {@link CompressionInputStream} that will read from the given {@link InputStream} using - * {@link LzoCodec}. + * Create a {@link InputStream} that will read from the given {@link InputStream} using {@link + * LzoCodec}. * * @param inputStream the stream to read compressed bytes from * @return a stream to read uncompressed bytes from * @throws IOException */ - public static CompressionInputStream createLzoInputStream(InputStream inputStream) - throws IOException { + public static InputStream createLzoInputStream(InputStream inputStream) throws IOException { return new LzoCodec().createInputStream(inputStream); } /** - * Create a {@link CompressionInputStream} that will read from the given {@link InputStream} using - * {@link LzopCodec}. + * Create a {@link InputStream} that will read from the given {@link InputStream} using {@link Review comment: ```suggestion * Create an {@link InputStream} that will read from the given {@link InputStream} using {@link ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392678) Time Spent: 16h 40m (was: 16.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 16h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392680 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 16:51 Start Date: 25/Feb/20 16:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r383992783 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompression.java ## @@ -22,58 +22,52 @@ import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; -import org.apache.hadoop.io.compress.CompressionInputStream; -import org.apache.hadoop.io.compress.CompressionOutputStream; public class LzoCompression { /** - * Create a {@link CompressionInputStream} that will read from the given {@link InputStream} using - * {@link LzoCodec}. + * Create a {@link InputStream} that will read from the given {@link InputStream} using {@link Review comment: ```suggestion * Create an {@link InputStream} that will read from the given {@link InputStream} using {@link ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392680) Time Spent: 17h (was: 16h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 17h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392679 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 16:51 Start Date: 25/Feb/20 16:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r383993750 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -83,7 +82,7 @@ @RunWith(JUnit4.class) public class CompressedSourceTest { - private final double DELTA = 1e-6; + private final double delta = 1e-6; Review comment: nit: you should have declared this static and kept the capital letters instead of making it a member variable of CompressedSourceTest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392679) Time Spent: 16h 50m (was: 16h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 16h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392677 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 16:51 Start Date: 25/Feb/20 16:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r383993034 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompression.java ## @@ -22,58 +22,52 @@ import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; -import org.apache.hadoop.io.compress.CompressionInputStream; -import org.apache.hadoop.io.compress.CompressionOutputStream; public class LzoCompression { /** - * Create a {@link CompressionInputStream} that will read from the given {@link InputStream} using - * {@link LzoCodec}. + * Create a {@link InputStream} that will read from the given {@link InputStream} using {@link + * LzoCodec}. * * @param inputStream the stream to read compressed bytes from * @return a stream to read uncompressed bytes from * @throws IOException */ - public static CompressionInputStream createLzoInputStream(InputStream inputStream) - throws IOException { + public static InputStream createLzoInputStream(InputStream inputStream) throws IOException { return new LzoCodec().createInputStream(inputStream); } /** - * Create a {@link CompressionInputStream} that will read from the given {@link InputStream} using - * {@link LzopCodec}. + * Create a {@link InputStream} that will read from the given {@link InputStream} using {@link + * LzopCodec}. * * @param inputStream the stream to read compressed bytes from * @return a stream to read uncompressed bytes from * @throws IOException */ - public static CompressionInputStream createLzopInputStream(InputStream inputStream) - throws IOException { + public static InputStream createLzopInputStream(InputStream inputStream) throws IOException { return new LzopCodec().createInputStream(inputStream); } /** - * Create a {@link CompressionOutputStream} that will write to the given {@link OutputStream}. + * Create a {@link OutputStream} that will write to the given {@link OutputStream}. Review comment: ```suggestion * Create an {@link OutputStream} that will write to the given {@link OutputStream}. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392677) Time Spent: 16.5h (was: 16h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 16.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392676 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 16:47 Start Date: 25/Feb/20 16:47 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590955514 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392676) Time Spent: 16h 20m (was: 16h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 16h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392508 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 11:51 Start Date: 25/Feb/20 11:51 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590828387 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392508) Time Spent: 16h 10m (was: 16h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 16h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392507 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 11:49 Start Date: 25/Feb/20 11:49 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590828387 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392507) Time Spent: 16h (was: 15h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 16h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392503 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 11:46 Start Date: 25/Feb/20 11:46 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590824053 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392503) Time Spent: 15h 50m (was: 15h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 15h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392496 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 11:36 Start Date: 25/Feb/20 11:36 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590824053 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392496) Time Spent: 15h 40m (was: 15.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 15h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392372 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 08:29 Start Date: 25/Feb/20 08:29 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590744759 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392372) Time Spent: 15.5h (was: 15h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 15.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392370 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 08:28 Start Date: 25/Feb/20 08:28 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590744759 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392370) Time Spent: 15h 20m (was: 15h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 15h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392369 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 08:27 Start Date: 25/Feb/20 08:27 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590744228 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392369) Time Spent: 15h 10m (was: 15h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 15h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392368 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 25/Feb/20 08:26 Start Date: 25/Feb/20 08:26 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590744228 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392368) Time Spent: 15h (was: 14h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 15h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392107 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 21:33 Start Date: 24/Feb/20 21:33 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590559659 That sounds great. Should have caught that earlier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392107) Time Spent: 14h 40m (was: 14.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 14h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392108 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 21:33 Start Date: 24/Feb/20 21:33 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590559659 That sounds great. I should have caught that earlier. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392108) Time Spent: 14h 50m (was: 14h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 14h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392102 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 21:28 Start Date: 24/Feb/20 21:28 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590557571 @lukecwik We Observed replacing Compression I/O stream with java.io I/O stream in LzoCompression.java can resolve the issue. Should we go ahead and do that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392102) Time Spent: 14.5h (was: 14h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 14.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=392084=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-392084 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 20:53 Start Date: 24/Feb/20 20:53 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590543238 WordCount doesn't depend on using LZO so it shouldn't be a dependency and the pipeline should execute successfully without it. The test may be picking up a legitimate case which users would hit as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 392084) Time Spent: 14h 20m (was: 14h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 14h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391852 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 17:42 Start Date: 24/Feb/20 17:42 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590459393 @lukecwik Do we need to add test dependency for facebook-presto and airlift in /beam/examples/java/build.gradle This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391852) Time Spent: 14h (was: 13h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 14h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391853 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 17:43 Start Date: 24/Feb/20 17:43 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590459393 @lukecwik Do we need to add test dependency for facebook-presto and airlift in /beam/examples/java/build.gradle ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391853) Time Spent: 14h 10m (was: 14h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 14h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391826 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 16:43 Start Date: 24/Feb/20 16:43 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590426460 Run JavaPortabilityApi PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391826) Time Spent: 13h 40m (was: 13.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 13h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391825 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 16:43 Start Date: 24/Feb/20 16:43 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590426399 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391825) Time Spent: 13.5h (was: 13h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 13.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391827 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 24/Feb/20 16:43 Start Date: 24/Feb/20 16:43 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590426524 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391827) Time Spent: 13h 50m (was: 13h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 13h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391369 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 23/Feb/20 13:43 Start Date: 23/Feb/20 13:43 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590069911 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391369) Time Spent: 13h 20m (was: 13h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 13h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391368 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 23/Feb/20 13:39 Start Date: 23/Feb/20 13:39 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-590069911 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391368) Time Spent: 13h 10m (was: 13h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 13h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391194 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 22/Feb/20 17:27 Start Date: 22/Feb/20 17:27 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589978924 `Run Java_Examples_Dataflow PreCommit` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391194) Time Spent: 13h (was: 12h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 13h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391193 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 22/Feb/20 17:27 Start Date: 22/Feb/20 17:27 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589978924 `Run Java_Examples_Dataflow PreCommit` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391193) Time Spent: 12h 50m (was: 12h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 12h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391171 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 22/Feb/20 15:55 Start Date: 22/Feb/20 15:55 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589968838 Run JavaPortabilityApi PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391171) Time Spent: 12h 40m (was: 12.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 12h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=391170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-391170 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 22/Feb/20 15:52 Start Date: 22/Feb/20 15:52 Worklog Time Spent: 10m Work Description: shubham-srivastav commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589968838 Run JavaPortabilityApi PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 391170) Time Spent: 12.5h (was: 12h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 12.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=390970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-390970 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 21/Feb/20 23:28 Start Date: 21/Feb/20 23:28 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589879728 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 390970) Time Spent: 12h 20m (was: 12h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 12h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=390969=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-390969 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 21/Feb/20 23:28 Start Date: 21/Feb/20 23:28 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589879703 Run JavaPortabilityApi PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 390969) Time Spent: 12h 10m (was: 12h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 12h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=390963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-390963 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 21/Feb/20 23:16 Start Date: 21/Feb/20 23:16 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589876952 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 390963) Time Spent: 12h (was: 11h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 12h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=390962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-390962 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 21/Feb/20 23:13 Start Date: 21/Feb/20 23:13 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589876072 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 390962) Time Spent: 11h 50m (was: 11h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389912 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 20/Feb/20 13:15 Start Date: 20/Feb/20 13:15 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-589016725 > After committing the comments, you may need to run spotlessApply again. Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389912) Time Spent: 11h 40m (was: 11.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389532 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:52 Start Date: 19/Feb/20 17:52 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-588354735 After committing the comments, you may need to run spotlessApply again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389532) Time Spent: 11.5h (was: 11h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389524 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:51 Start Date: 19/Feb/20 17:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381436101 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +153,54 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just Review comment: ```suggestion * LZO compression using LZO codec. {@code .lzo_deflate} extension is specified for files which ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389524) Time Spent: 11h (was: 10h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389525 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:51 Start Date: 19/Feb/20 17:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381437508 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +153,54 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + * + * The Beam Java SDK does not pull in the required libraries for LZO compression by default, so + * it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write + * .lzo_deflate files without {@code airlift/aircompressor} and {@code presto-hadoop-apache2} Review comment: ```suggestion * {@code .lzo_deflate} files without {@code io.airlift:aircompressor} and {@code com.facebook.presto.hadoop:hadoop-apache2} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389525) Time Spent: 11h (was: 10h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389527 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:51 Start Date: 19/Feb/20 17:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381438625 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +153,54 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + * + * The Beam Java SDK does not pull in the required libraries for LZO compression by default, so + * it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write + * .lzo_deflate files without {@code airlift/aircompressor} and {@code presto-hadoop-apache2} + * loaded will result in {@code NoClassDefFoundError} at runtime. + */ + LZO(".lzo_deflate", ".lzo_deflate") { +@Override +public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + return Channels.newChannel( + LzoCompression.createLzoInputStream(Channels.newInputStream(channel))); +} + +@Override +public WritableByteChannel writeCompressed(WritableByteChannel channel) throws IOException { + return Channels.newChannel( + LzoCompression.createLzoOutputStream(Channels.newOutputStream(channel))); +} + }, + + /** + * LZOP compression using LZOP Codec. .lzo extension is specified for the files with magic bytes + * and headers. + * + * The Beam Java SDK does not pull in the required libraries for LZOP compression by default, + * so it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write .lzo files Review comment: ```suggestion * io.airlift:aircompressor} and {@code com.facebook.presto.hadoop:hadoop-apache2}. Attempts to read or write {@code .lzo} files ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389527) Time Spent: 11h 10m (was: 11h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389531 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:51 Start Date: 19/Feb/20 17:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381439166 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +153,54 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + * + * The Beam Java SDK does not pull in the required libraries for LZO compression by default, so + * it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write + * .lzo_deflate files without {@code airlift/aircompressor} and {@code presto-hadoop-apache2} + * loaded will result in {@code NoClassDefFoundError} at runtime. + */ + LZO(".lzo_deflate", ".lzo_deflate") { +@Override +public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + return Channels.newChannel( + LzoCompression.createLzoInputStream(Channels.newInputStream(channel))); +} + +@Override +public WritableByteChannel writeCompressed(WritableByteChannel channel) throws IOException { + return Channels.newChannel( + LzoCompression.createLzoOutputStream(Channels.newOutputStream(channel))); +} + }, + + /** + * LZOP compression using LZOP Codec. .lzo extension is specified for the files with magic bytes + * and headers. + * + * The Beam Java SDK does not pull in the required libraries for LZOP compression by default, + * so it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write .lzo files + * without {@code airlift/aircompressor} and {@code presto-hadoop-apache2} loaded will result in Review comment: ```suggestion * without {@code io.airlift:aircompressor} and {@code com.facebook.presto.hadoop:hadoop-apache2} loaded will result in a ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389531) Time Spent: 11h 20m (was: 11h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389529 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:51 Start Date: 19/Feb/20 17:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381437988 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +153,54 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + * + * The Beam Java SDK does not pull in the required libraries for LZO compression by default, so + * it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write + * .lzo_deflate files without {@code airlift/aircompressor} and {@code presto-hadoop-apache2} + * loaded will result in {@code NoClassDefFoundError} at runtime. + */ + LZO(".lzo_deflate", ".lzo_deflate") { +@Override +public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + return Channels.newChannel( + LzoCompression.createLzoInputStream(Channels.newInputStream(channel))); +} + +@Override +public WritableByteChannel writeCompressed(WritableByteChannel channel) throws IOException { + return Channels.newChannel( + LzoCompression.createLzoOutputStream(Channels.newOutputStream(channel))); +} + }, + + /** + * LZOP compression using LZOP Codec. .lzo extension is specified for the files with magic bytes Review comment: ```suggestion * LZOP compression using LZOP codec. {@code .lzo} extension is specified for files with magic bytes ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389529) Time Spent: 11h 10m (was: 11h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389526 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:51 Start Date: 19/Feb/20 17:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381440755 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +153,54 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + * + * The Beam Java SDK does not pull in the required libraries for LZO compression by default, so + * it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write + * .lzo_deflate files without {@code airlift/aircompressor} and {@code presto-hadoop-apache2} + * loaded will result in {@code NoClassDefFoundError} at runtime. + */ + LZO(".lzo_deflate", ".lzo_deflate") { +@Override +public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + return Channels.newChannel( + LzoCompression.createLzoInputStream(Channels.newInputStream(channel))); +} + +@Override +public WritableByteChannel writeCompressed(WritableByteChannel channel) throws IOException { + return Channels.newChannel( + LzoCompression.createLzoOutputStream(Channels.newOutputStream(channel))); +} + }, + + /** + * LZOP compression using LZOP Codec. .lzo extension is specified for the files with magic bytes + * and headers. + * Review comment: ```suggestion * * Warning: The LZOP codec being used does not support concatenated LZOP streams and will * silently ignore data after the end of the first LZOP stream. * ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389526) Time Spent: 11h 10m (was: 11h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389530 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:51 Start Date: 19/Feb/20 17:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381437634 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +153,54 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + * + * The Beam Java SDK does not pull in the required libraries for LZO compression by default, so + * it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write + * .lzo_deflate files without {@code airlift/aircompressor} and {@code presto-hadoop-apache2} + * loaded will result in {@code NoClassDefFoundError} at runtime. Review comment: ```suggestion * loaded will result in a {@code NoClassDefFoundError} at runtime. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389530) Time Spent: 11h 20m (was: 11h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389528 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:51 Start Date: 19/Feb/20 17:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381437087 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +153,54 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + * + * The Beam Java SDK does not pull in the required libraries for LZO compression by default, so + * it is the user's responsibility to declare an explicit dependency on {@code + * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to read or write Review comment: ```suggestion * io.airlift:aircompressor} and {@code com.facebook.presto.hadoop:hadoop-apache2}. Attempts to read or write ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389528) Time Spent: 11h 10m (was: 11h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 11h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389517 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:41 Start Date: 19/Feb/20 17:41 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r381435599 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: Best to add that comment over the LZOP enum since nobody reading the documentation is going to find the comment in the tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389517) Time Spent: 10h 50m (was: 10h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389515 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 19/Feb/20 17:38 Start Date: 19/Feb/20 17:38 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-588347547 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389515) Time Spent: 10h 40m (was: 10.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389114 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:37 Start Date: 18/Feb/20 21:37 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-587893601 @lukecwik I've incorporated mostly all the suggested changes in the PR. Please let me know your thoughts on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389114) Time Spent: 10.5h (was: 10h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389111=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389111 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:33 Start Date: 18/Feb/20 21:33 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380947952 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: For now we have added a comment warning users that a concatenated lzo file doesn't gets decompressed correctly. Its added above testFalseReadConcatenatedLzop and testFalseReadMultiStreamLzop methods. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389111) Time Spent: 10h 20m (was: 10h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389097=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389097 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938634 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389097) Time Spent: 10h (was: 9h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389096 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938566 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' + compileOnly 'com.facebook.presto.hadoop:hadoop-apache2:3.2.0-1' Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389096) Time Spent: 9h 50m (was: 9h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 9h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389098 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938720 ## File path: sdks/java/core/build.gradle ## @@ -58,6 +58,10 @@ test { } } +configurations { +testCompile.extendsFrom compileOnly +} + Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389098) Time Spent: 10h 10m (was: 10h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389094 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938458 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389094) Time Spent: 9h 40m (was: 9.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 9h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389090=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389090 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:13 Start Date: 18/Feb/20 21:13 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938051 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompressorInputStream.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.util; + +import io.airlift.compress.lzo.LzoCodec; +import java.io.IOException; +import java.io.InputStream; +import org.apache.commons.compress.compressors.CompressorInputStream; +import org.apache.commons.compress.utils.CountingInputStream; +import org.apache.commons.compress.utils.IOUtils; +import org.apache.commons.compress.utils.InputStreamStatistics; + +/** + * {@link CompressorInputStream} implementation to create LZO encoded stream. Library relies on https://github.com/airlift/aircompressor/;>LZO + * + * @since 1.18 + */ +public class LzoCompressorInputStream extends CompressorInputStream Review comment: replaced wrapper classes with static methods This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389090) Time Spent: 9h 10m (was: 9h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 9h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389093 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:13 Start Date: 18/Feb/20 21:13 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938387 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +156,38 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389093) Time Spent: 9.5h (was: 9h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 9.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389092=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389092 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:13 Start Date: 18/Feb/20 21:13 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938292 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +156,38 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + */ + LZO(".lzo_deflate", ".lzo_deflate") { +@Override +public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + return Channels.newChannel(new LzoCompressorInputStream(Channels.newInputStream(channel))); +} + +@Override +public WritableByteChannel writeCompressed(WritableByteChannel channel) throws IOException { + return Channels.newChannel(new LzoCompressorOutputStream(Channels.newOutputStream(channel))); +} + }, + + /** + * LZOP compression using LZOP Codec. .lzo extension is specified for the files with magic bytes + * and headers. Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389092) Time Spent: 9h 20m (was: 9h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 9h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389085=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389085 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:12 Start Date: 18/Feb/20 21:12 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380937429 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source + assertEquals(1.0, reader.getFractionConsumed(), DELTA); + assertEquals(1, reader.getSplitPointsConsumed()); + assertEquals(0, reader.getSplitPointsRemaining()); +} + } + + @Test + public void testEmptyLzoProgress() throws IOException { +File tmpFile = tmpFolder.newFile("empty.lzo_deflate"); +String filename = tmpFile.toPath().toString(); +writeFile(tmpFile, new byte[0], CompressionMode.LZO); + +PipelineOptions options = PipelineOptionsFactory.create(); +CompressedSource source = +CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZO); +try (BoundedReader readerOrig = source.createReader(options)) { + assertThat(readerOrig, instanceOf(CompressedReader.class)); + CompressedReader reader = (CompressedReader) readerOrig; + // before starting + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + // confirm empty + assertFalse(reader.start()); + // after reading empty source + assertEquals(1.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(0, reader.getSplitPointsRemaining()); +} + } + + @Test + public void testLzoProgress() throws IOException { +int numRecords = 3; +File tmpFile = tmpFolder.newFile("nonempty.lzo"); +String filename = tmpFile.toPath().toString(); +writeFile(tmpFile, new byte[numRecords], CompressionMode.LZO); + +PipelineOptions options = PipelineOptionsFactory.create(); +CompressedSource source = +CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZO); +try (BoundedReader readerOrig = source.createReader(options)) { + assertThat(readerOrig, instanceOf(CompressedReader.class)); + CompressedReader reader = (CompressedReader) readerOrig; + // before starting + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389085) Time Spent: 9h (was: 8h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter:
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389084=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389084 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:11 Start Date: 18/Feb/20 21:11 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380937373 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -755,7 +1212,7 @@ public void testGzipProgress() throws IOException { assertFalse(reader.advance()); // after reading empty source Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389084) Time Spent: 8h 50m (was: 8h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 8h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=387187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387187 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 14/Feb/20 07:40 Start Date: 14/Feb/20 07:40 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-586136772 @lukecwik we are working on all the suggestions provided by you, will be updating the PR in a few days. Thank you for your patience. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387187) Time Spent: 8h 40m (was: 8.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 8h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=386873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386873 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 13/Feb/20 20:57 Start Date: 13/Feb/20 20:57 Worklog Time Spent: 10m Work Description: gsteelman commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-585967896 Hi @amoght can you address the open comments? Have reached out to a couple more folks internally at Twitter to request some more eyes on this. Hoping to get this wrapped up soon. Thank you for your work so far. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386873) Time Spent: 8.5h (was: 8h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 8.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=384662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384662 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 10/Feb/20 19:21 Start Date: 10/Feb/20 19:21 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-584305044 I was pinging about whether there was any recent work to address my last review's comments? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384662) Time Spent: 8h 20m (was: 8h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 8h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=384129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384129 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Feb/20 18:24 Start Date: 09/Feb/20 18:24 Worklog Time Spent: 10m Work Description: nownikhil commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-583876691 LGTM - Twitter Core data libraries team This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384129) Time Spent: 8h 10m (was: 8h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 8h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=376620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-376620 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 23/Jan/20 22:21 Start Date: 23/Jan/20 22:21 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-577905297 Ping, any updates? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 376620) Time Spent: 8h (was: 7h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 8h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=373224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373224 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 16/Jan/20 19:26 Start Date: 16/Jan/20 19:26 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r367605664 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: It would be better if the concatenated streams for LZOP worked or if concatenated streams were detected then an exception was thrown to the user. Having a check that ensures the number of bytes read from the stream/channel is equivalent to the channels length would be one way of supporting this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373224) Time Spent: 7h 50m (was: 7h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 7h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=373223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373223 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 16/Jan/20 19:25 Start Date: 16/Jan/20 19:25 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r367605664 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: It would be better if the concatenated streams for LZOP worked or if concatenated streams were detected then an exception was thrown to the user. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373223) Time Spent: 7h 40m (was: 7.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 7h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=373222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373222 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 16/Jan/20 19:24 Start Date: 16/Jan/20 19:24 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r367605058 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompressorInputStream.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.util; + +import io.airlift.compress.lzo.LzoCodec; +import java.io.IOException; +import java.io.InputStream; +import org.apache.commons.compress.compressors.CompressorInputStream; +import org.apache.commons.compress.utils.CountingInputStream; +import org.apache.commons.compress.utils.IOUtils; +import org.apache.commons.compress.utils.InputStreamStatistics; + +/** + * {@link CompressorInputStream} implementation to create LZO encoded stream. Library relies on https://github.com/airlift/aircompressor/;>LZO + * + * @since 1.18 + */ +public class LzoCompressorInputStream extends CompressorInputStream Review comment: A class called LzoCompression in util is fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373222) Time Spent: 7.5h (was: 7h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 7.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=373189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373189 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 16/Jan/20 17:55 Start Date: 16/Jan/20 17:55 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r367564910 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: line 325: public void testReadConcatenatedLzo() throws IOException: Is this for LZOP codec? Since unlike LZOP, LZO Codec supports file concatenation. Also we have testReadMultiStreamLzo and testFalseReadConcatenatedLzop This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373189) Time Spent: 7h 20m (was: 7h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 7h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=373178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373178 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 16/Jan/20 17:52 Start Date: 16/Jan/20 17:52 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r367563291 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompressorInputStream.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.util; + +import io.airlift.compress.lzo.LzoCodec; +import java.io.IOException; +import java.io.InputStream; +import org.apache.commons.compress.compressors.CompressorInputStream; +import org.apache.commons.compress.utils.CountingInputStream; +import org.apache.commons.compress.utils.IOUtils; +import org.apache.commons.compress.utils.InputStreamStatistics; + +/** + * {@link CompressorInputStream} implementation to create LZO encoded stream. Library relies on https://github.com/airlift/aircompressor/;>LZO + * + * @since 1.18 + */ +public class LzoCompressorInputStream extends CompressorInputStream Review comment: Where do you suggest to keep static methods? Currently we have wrapper class in util package. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373178) Time Spent: 7h 10m (was: 7h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 7h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369349 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:53 Start Date: 09/Jan/20 20:53 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364954499 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -755,7 +1212,7 @@ public void testGzipProgress() throws IOException { assertFalse(reader.advance()); // after reading empty source Review comment: ```suggestion // after reading source ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369349) Time Spent: 6h 50m (was: 6h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369350 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:53 Start Date: 09/Jan/20 20:53 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364954346 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: ```suggestion // after reading source ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369350) Time Spent: 6h 50m (was: 6h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369348 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:53 Start Date: 09/Jan/20 20:53 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364949936 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -235,6 +315,30 @@ public void testReadConcatenatedGzip() throws IOException { assertEquals(Bytes.asList(expected), actual); } + /** + * Using Lzo Codec Test a concatenation of lzo files is correctly decompressed. + * + * A concatenation of lzo files as one file is a valid lzo file and should decompress to be the + * concatenation of those individual files. Review comment: The closing tag isn't needed in javadoc even if your editor is inserting it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369348) Time Spent: 6h 40m (was: 6.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369352 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:53 Start Date: 09/Jan/20 20:53 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364954411 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source + assertEquals(1.0, reader.getFractionConsumed(), DELTA); + assertEquals(1, reader.getSplitPointsConsumed()); + assertEquals(0, reader.getSplitPointsRemaining()); +} + } + + @Test + public void testEmptyLzoProgress() throws IOException { +File tmpFile = tmpFolder.newFile("empty.lzo_deflate"); +String filename = tmpFile.toPath().toString(); +writeFile(tmpFile, new byte[0], CompressionMode.LZO); + +PipelineOptions options = PipelineOptionsFactory.create(); +CompressedSource source = +CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZO); +try (BoundedReader readerOrig = source.createReader(options)) { + assertThat(readerOrig, instanceOf(CompressedReader.class)); + CompressedReader reader = (CompressedReader) readerOrig; + // before starting + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + // confirm empty + assertFalse(reader.start()); + // after reading empty source + assertEquals(1.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(0, reader.getSplitPointsRemaining()); +} + } + + @Test + public void testLzoProgress() throws IOException { +int numRecords = 3; +File tmpFile = tmpFolder.newFile("nonempty.lzo"); +String filename = tmpFile.toPath().toString(); +writeFile(tmpFile, new byte[numRecords], CompressionMode.LZO); + +PipelineOptions options = PipelineOptionsFactory.create(); +CompressedSource source = +CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZO); +try (BoundedReader readerOrig = source.createReader(options)) { + assertThat(readerOrig, instanceOf(CompressedReader.class)); + CompressedReader reader = (CompressedReader) readerOrig; + // before starting + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: ```suggestion // after reading source ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369352) Time Spent: 7h (was: 6h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature >
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369351 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:53 Start Date: 09/Jan/20 20:53 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364953576 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -235,6 +315,30 @@ public void testReadConcatenatedGzip() throws IOException { assertEquals(Bytes.asList(expected), actual); } + /** + * Using Lzo Codec Test a concatenation of lzo files is correctly decompressed. + * + * A concatenation of lzo files as one file is a valid lzo file and should decompress to be the + * concatenation of those individual files. + */ + @Test + public void testReadConcatenatedLzo() throws IOException { Review comment: Can we either add support for multistream or throw an exception if the stream isn't finished? It would be dangerous for users to have part of their data silently dropped in this scenario. We should also add to the comment that concatenated streams aren't supported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369351) Time Spent: 6h 50m (was: 6h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369327 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364922097 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +156,38 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. Review comment: Please add to this comment telling people what dependencies they need to pull in similar to the comment to zstd above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369327) Time Spent: 6h 20m (was: 6h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369326 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364922208 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +156,38 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + */ + LZO(".lzo_deflate", ".lzo_deflate") { +@Override +public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + return Channels.newChannel(new LzoCompressorInputStream(Channels.newInputStream(channel))); +} + +@Override +public WritableByteChannel writeCompressed(WritableByteChannel channel) throws IOException { + return Channels.newChannel(new LzoCompressorOutputStream(Channels.newOutputStream(channel))); +} + }, + + /** + * LZOP compression using LZOP Codec. .lzo extension is specified for the files with magic bytes + * and headers. Review comment: Please add to this comment telling people what dependencies they need to pull in similar to the comment to zstd above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369326) Time Spent: 6h 20m (was: 6h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369324 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364921304 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' Review comment: ```suggestion provided 'io.airlift:aircompressor:0.16' ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369324) Time Spent: 6h (was: 5h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369321=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369321 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364920670 ## File path: sdks/java/core/build.gradle ## @@ -58,6 +58,10 @@ test { } } +configurations { +testCompile.extendsFrom compileOnly +} Review comment: ```suggestion ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369321) Time Spent: 5h 40m (was: 5.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 5h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369320 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364920514 ## File path: sdks/java/core/build.gradle ## @@ -58,6 +58,10 @@ test { } } +configurations { Review comment: ```suggestion ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369320) Time Spent: 5.5h (was: 5h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 5.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369319 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364920618 ## File path: sdks/java/core/build.gradle ## @@ -58,6 +58,10 @@ test { } } +configurations { +testCompile.extendsFrom compileOnly Review comment: ```suggestion ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369319) Time Spent: 5.5h (was: 5h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 5.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369323 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364921379 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' + compileOnly 'com.facebook.presto.hadoop:hadoop-apache2:3.2.0-1' Review comment: ```suggestion provided 'com.facebook.presto.hadoop:hadoop-apache2:3.2.0-1' ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369323) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 5h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369328 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364935674 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompressorInputStream.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.util; + +import io.airlift.compress.lzo.LzoCodec; +import java.io.IOException; +import java.io.InputStream; +import org.apache.commons.compress.compressors.CompressorInputStream; +import org.apache.commons.compress.utils.CountingInputStream; +import org.apache.commons.compress.utils.IOUtils; +import org.apache.commons.compress.utils.InputStreamStatistics; + +/** + * {@link CompressorInputStream} implementation to create LZO encoded stream. Library relies on https://github.com/airlift/aircompressor/;>LZO + * + * @since 1.18 + */ +public class LzoCompressorInputStream extends CompressorInputStream Review comment: Instead of creating a wrapper class that delegates to lzoIS, create a static method which returns the LZO and LZOP input and output streams and invoke the appropriate static method from the enum readDecompressed/writeDecompressed Putting the code into a static method will prevent the LzoCodec/LzopCodec from being loaded till the static method is called. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369328) Time Spent: 6h 20m (was: 6h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369329 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364924751 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java ## @@ -152,6 +156,38 @@ public WritableByteChannel writeCompressed(WritableByteChannel channel) throws I } }, + /** + * LZO compression using LZO Codec. .lzo_deflate extension is specified for the files which just + * use the LZO algorithm without headers. + */ + LZO(".lzo_deflate", ".lzo_deflate") { +@Override +public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + return Channels.newChannel(new LzoCompressorInputStream(Channels.newInputStream(channel))); +} + +@Override +public WritableByteChannel writeCompressed(WritableByteChannel channel) throws IOException { + return Channels.newChannel(new LzoCompressorOutputStream(Channels.newOutputStream(channel))); +} + }, + + /** + * LZOP compression using LZOP Codec. .lzo extension is specified for the files with magic bytes + * and headers. + */ + LZOP(".lzo", ".lzo") { +@Override +public ReadableByteChannel readDecompressed(ReadableByteChannel channel) throws IOException { + return Channels.newChannel(new LzopCompressorInputStream(Channels.newInputStream(channel))); Review comment: Why do you need LzoCompressorInputStream class at all? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369329) Time Spent: 6.5h (was: 6h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369322 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364920736 ## File path: sdks/java/core/build.gradle ## @@ -58,6 +58,10 @@ test { } } +configurations { +testCompile.extendsFrom compileOnly +} + Review comment: ```suggestion ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369322) Time Spent: 5h 50m (was: 5h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 5h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=369325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369325 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 20:05 Start Date: 09/Jan/20 20:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r364921654 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' Review comment: Please group the provided dependencies that have been added here with the ones above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369325) Time Spent: 6h 10m (was: 6h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 6h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=368934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-368934 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 09/Jan/20 10:44 Start Date: 09/Jan/20 10:44 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-572504197 @lukecwik I've updated the PR based on the discussion that we had. Please let me know your thoughts and suggestions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 368934) Time Spent: 5h 20m (was: 5h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 5h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=360598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360598 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 16/Dec/19 23:59 Start Date: 16/Dec/19 23:59 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r358529085 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -761,6 +1043,132 @@ public void testGzipProgress() throws IOException { } } + @Test + public void testEmptyLzoProgress() throws IOException { +File tmpFile = tmpFolder.newFile("empty.lzo_deflate"); +String filename = tmpFile.toPath().toString(); +writeFile(tmpFile, new byte[0], CompressionMode.LZO); + +PipelineOptions options = PipelineOptionsFactory.create(); +CompressedSource source = +CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZO); +try (BoundedReader readerOrig = source.createReader(options)) { + assertThat(readerOrig, instanceOf(CompressedReader.class)); + CompressedReader reader = (CompressedReader) readerOrig; + // before starting + assertEquals(0.0, reader.getFractionConsumed(), 1e-6); Review comment: I think we can add the constant for `CompressedSourceTest.java` at least. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 360598) Time Spent: 5h 10m (was: 5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 5h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=360597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360597 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 16/Dec/19 23:58 Start Date: 16/Dec/19 23:58 Worklog Time Spent: 10m Work Description: gsteelman commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r358528923 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -235,6 +315,30 @@ public void testReadConcatenatedGzip() throws IOException { assertEquals(Bytes.asList(expected), actual); } + /** + * Using Lzo Codec Test a concatenation of lzo files is correctly decompressed. + * + * A concatenation of lzo files as one file is a valid lzo file and should decompress to be the + * concatenation of those individual files. + */ + @Test + public void testReadConcatenatedLzo() throws IOException { Review comment: Perhaps it would be a good idea to add a test with an expected failure then? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 360597) Time Spent: 5h (was: 4h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=360154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-360154 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 16/Dec/19 10:08 Start Date: 16/Dec/19 10:08 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-565992871 > @amoght I don't have enough context to make the call on that, as I am very new to Beam. I have reached out to some others at Twitter to also review this change, as they will have more context. Thanks Gary :) appreciate your help! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 360154) Time Spent: 4h 50m (was: 4h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 4h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=359647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359647 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 13/Dec/19 20:56 Start Date: 13/Dec/19 20:56 Worklog Time Spent: 10m Work Description: gsteelman commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-565605851 @amoght I don't have enough context to make the call on that, as I am very new to Beam. I have reached out to some others at Twitter to also review this change, as they will have more context. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359647) Time Spent: 4h 40m (was: 4.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 4h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=359607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359607 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 13/Dec/19 19:30 Start Date: 13/Dec/19 19:30 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-565577445 @gsteelman we have used the airlift/aircompressor library to only get the compression and decompression mechanism, the implementation of Input/Output stream there introduces the transitive dependency, which can be removed and replaced with apache hadoop common library. This significantly reduces the size as well. So, here are the 2 possible options: 1) We only use the compression and decompression mechanism from airlift/aircompressor and design the Input/Output Streams for beam accordingly. This will be needed to be updated if there is any change in those classes on airlift/aircompressor's end. But, since we will only be using the compression and decompression mechanism from airlift/aircompressor, the updates will be small and quite rare. Therefore, this won't be that big of an issue. 2) We introduce LZO as an optional package for beam. As this will give users the option to manage their beam size (if it is a constraint) or if LZO is not required. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359607) Time Spent: 4.5h (was: 4h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 4.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=359023=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359023 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 12/Dec/19 23:48 Start Date: 12/Dec/19 23:48 Worklog Time Spent: 10m Work Description: gsteelman commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-565238477 > While studying the code, we found that the airlift/ aircompressor library only requires some classes which are also present in apache hadoop common package(~3.9MB). Therefore, we are now thinking that of making changes in the airlift/ aircompressor package, replacing the > com.facebook.presto.hadoop with org.apache.hadoop.common and removing other compression mechanisms present in the airlift/aircompressor package(like zstd, gzip etc) while only keeping the required LZO package. > But if we go ahead with this approach, we will have to manually update this library whenever any changes are made to the airlift/aircompressor's LZO package. > @lukecwik @gsteelman please provide your thoughts on this. Is it possible to instead add the dependencies on the `apache.hadoop.common` package directly in these changes, and not add a dependency on airlift/aircompressor this change? I would prefer to stick with strict dependencies when possible, rather than relying on transitive dependencies to bring in the classes we need. Relying on the transitive dependencies brought in by airlift/aircompressor has its own set of issues, including having to update our libraries whenever changes are made to airlift. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 359023) Time Spent: 4h 20m (was: 4h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 4h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=358540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-358540 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 12/Dec/19 10:28 Start Date: 12/Dec/19 10:28 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-564256222 While studying the code, we found that the airlift/ aircompressor library only requires some classes which are also present in apache hadoop common package(~3.9MB). Therefore, we are now thinking that of making changes in the airlift/ aircompressor package, replacing the com.facebook.presto.hadoop with org.apache.hadoop.common and removing other compression mechanisms present in the airlift/aircompressor package(like zstd, gzip etc) while only keeping the required LZO package. But if we go ahead with this approach, we will have to manually update this library whenever any changes are made to the airlift/aircompressor's LZO package. @lukecwik @gsteelman please provide your thoughts on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 358540) Time Spent: 4h 10m (was: 4h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=357405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-357405 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 10/Dec/19 21:01 Start Date: 10/Dec/19 21:01 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-564256222 While studying the code, we found that the airlift/ aircompressor library only requires some classes which are also present in apache hadoop common package(~3.9MB). Therefore, we are now thinking that if we make changes in the airlift/ aircompressor package, replacing the com.facebook.presto.hadoop with org.apache.hadoop.common and remove other compression mechanisms(like zstd, gzip etc) while only keeping the required LZO package. But if we go ahead with this approach, we will have to manually update this library whenever any changes are made to the airlift/aircompressor's LZO package. @lukecwik @gsteelman please provide your thoughts on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 357405) Time Spent: 4h (was: 3h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 4h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354532 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 18:03 Start Date: 05/Dec/19 18:03 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354464601 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -161,6 +189,28 @@ public void testGzipSplittable() throws Exception { assertFalse(source.isSplittable()); } + /** Test splittability of files in LZO mode -- none should be splittable. */ + @Test + public void testLzoSplittable() throws Exception { Review comment: Thanks for pointing this out, this has been added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354532) Time Spent: 3h 50m (was: 3h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354515 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 17:46 Start Date: 05/Dec/19 17:46 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354456382 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -761,6 +1043,132 @@ public void testGzipProgress() throws IOException { } } + @Test + public void testEmptyLzoProgress() throws IOException { +File tmpFile = tmpFolder.newFile("empty.lzo_deflate"); +String filename = tmpFile.toPath().toString(); +writeFile(tmpFile, new byte[0], CompressionMode.LZO); + +PipelineOptions options = PipelineOptionsFactory.create(); +CompressedSource source = +CompressedSource.from(new ByteSource(filename, 1)).withDecompression(CompressionMode.LZO); +try (BoundedReader readerOrig = source.createReader(options)) { + assertThat(readerOrig, instanceOf(CompressedReader.class)); + CompressedReader reader = (CompressedReader) readerOrig; + // before starting + assertEquals(0.0, reader.getFractionConsumed(), 1e-6); Review comment: It can be done. But that would require altering all the tests that use this constant value. Will that be fine if we do that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354515) Time Spent: 3.5h (was: 3h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354516 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 17:46 Start Date: 05/Dec/19 17:46 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354451206 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -235,6 +315,30 @@ public void testReadConcatenatedGzip() throws IOException { assertEquals(Bytes.asList(expected), actual); } + /** + * Using Lzo Codec Test a concatenation of lzo files is correctly decompressed. + * + * A concatenation of lzo files as one file is a valid lzo file and should decompress to be the + * concatenation of those individual files. Review comment: This is happening when we run the spotlessApply task. When the tag is closed, the spotlessCheck fails. Not sure of the reason behind that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354516) Time Spent: 3h 40m (was: 3.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354499 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 17:40 Start Date: 05/Dec/19 17:40 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354453765 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -267,6 +371,69 @@ public void testReadMultiStreamBzip2() throws IOException { verifyReadContents(output, tmpFile, mode); } + /** + * Test a lzo file containing multiple streams is correctly decompressed. + * + * A lzo file may contain multiple streams and should decompress as the concatenation of those + * streams. Review comment: This is happening due to spotlessApply. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354499) Time Spent: 3h 20m (was: 3h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 3h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354498 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 17:39 Start Date: 05/Dec/19 17:39 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354453765 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -267,6 +371,69 @@ public void testReadMultiStreamBzip2() throws IOException { verifyReadContents(output, tmpFile, mode); } + /** + * Test a lzo file containing multiple streams is correctly decompressed. + * + * A lzo file may contain multiple streams and should decompress as the concatenation of those + * streams. Review comment: This is happening during spotlessApply. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354498) Time Spent: 3h 10m (was: 3h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354497 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 17:38 Start Date: 05/Dec/19 17:38 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354453135 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -235,6 +315,30 @@ public void testReadConcatenatedGzip() throws IOException { assertEquals(Bytes.asList(expected), actual); } + /** + * Using Lzo Codec Test a concatenation of lzo files is correctly decompressed. + * + * A concatenation of lzo files as one file is a valid lzo file and should decompress to be the + * concatenation of those individual files. + */ + @Test + public void testReadConcatenatedLzo() throws IOException { Review comment: The current behaviour of LZOP codec is that it returns the contents of the first file only, if concatenated files are given because of the presence of headers. This causes the test to fail. That is why we have not added this test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354497) Time Spent: 3h (was: 2h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354492 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 17:34 Start Date: 05/Dec/19 17:34 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354451206 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -235,6 +315,30 @@ public void testReadConcatenatedGzip() throws IOException { assertEquals(Bytes.asList(expected), actual); } + /** + * Using Lzo Codec Test a concatenation of lzo files is correctly decompressed. + * + * A concatenation of lzo files as one file is a valid lzo file and should decompress to be the + * concatenation of those individual files. Review comment: This is happening when we run the spotlessApply task. When the tag is clossed, the spotlessCheck fails. Not sure of the reason behind that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354492) Time Spent: 2h 50m (was: 2h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354462 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 16:55 Start Date: 05/Dec/19 16:55 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354431147 ## File path: sdks/java/core/build.gradle ## @@ -90,4 +90,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compile 'io.airlift:aircompressor:0.16' + compile 'com.facebook.presto.hadoop:hadoop-apache2:3.2.0-1' Review comment: This is included because LzoCodec class that has been used to create Input streams is using some classes of the org.apache.hadoop package, which is a part of com.facebook.presto.hadoop. Since the aircompressor is designed to also support optional hadoop configurations, hadoop is coming into picture(in our case, hadoop config is null). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354462) Time Spent: 2h 40m (was: 2.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=354447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354447 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 05/Dec/19 16:40 Start Date: 05/Dec/19 16:40 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r354422677 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompressorInputStream.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.util; + +import io.airlift.compress.lzo.LzoCodec; +import java.io.IOException; +import java.io.InputStream; +import org.apache.commons.compress.compressors.CompressorInputStream; +import org.apache.commons.compress.utils.CountingInputStream; +import org.apache.commons.compress.utils.IOUtils; +import org.apache.commons.compress.utils.InputStreamStatistics; + +/** + * {@link CompressorInputStream} implementation to create LZO encoded stream. Library relies on https://github.com/airlift/aircompressor/;>LZO + * + * @since 1.18 + */ +public class LzoCompressorInputStream extends CompressorInputStream +implements InputStreamStatistics { + + private final CountingInputStream countingStream; + private final InputStream lzoIS; + + /** + * Wraps the given stream into a aircompressor's HadoopLzoInputStream using the LzoCodec. + * + * @param inStream the stream to write to + * @throws IOException if aircompressor does + */ + public LzoCompressorInputStream(final InputStream inStream) throws IOException { +this.lzoIS = +new LzoCodec().createInputStream(countingStream = new CountingInputStream(inStream)); + } + + @Override + public int available() throws IOException { +return lzoIS.available(); + } + + @Override + public void close() throws IOException { +lzoIS.close(); + } + + @Override + public int read(final byte[] b) throws IOException { +return read(b, 0, b.length); + } + + @Override + public long skip(final long n) throws IOException { +return IOUtils.skip(lzoIS, n); + } + + @Override + public void mark(final int readlimit) { +lzoIS.mark(readlimit); + } + + @Override + public boolean markSupported() { +return lzoIS.markSupported(); + } + + @Override + public int read() throws IOException { +final int ret = lzoIS.read(); +count(ret == -1 ? 0 : 1); +return ret; + } + + @Override + public int read(final byte[] buf, final int off, final int len) throws IOException { +if (len == 0) { + return 0; +} +final int ret = lzoIS.read(buf, off, len); Review comment: No, this case is getting handled. This check has been put simply for the reason that if buffer length is 0, the read method doesn't even get executed and is handled here itself. Basically, to avoid unnecessary method call overhead. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 354447) Time Spent: 2.5h (was: 2h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO >