[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user sekruse commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-110013765 Okay, will do that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/762 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-109912571 Thank you for your contribution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-109907044 Thanks for the documentation. Could you open a JIRA to account for the necessary changes in terms of extensibility? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user sekruse commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-109385326 Okay, I did not further explain the internals but only how to employ the deflate and GZip support. I think, to make compression extensible or customizable (which would be worthwhile in my opinion), we should make small changes to the code wrt. to its usability. That however does not match the contents of the associated JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-108845535 You can modify the documentation in the `docs/apis/programming_guide.md` file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-108845395 I'm talking about the user documentation. You could mention support for gzip and add an example here: http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sources --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user sekruse commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-108844255 Sure, I can do that. Do you talk about a user documentation or more Java docs. And if the former applies, where would I put that documentation preferrably? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-108812916 :+1: This has been requested multiple times now. I would merge your pull request. Can you add some documentation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user sekruse commented on the pull request: https://github.com/apache/flink/pull/762#issuecomment-108443527 I exchanged that part with the Validate with Preconditions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31562955 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException { * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper */ protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit fileSplit) throws Throwable { - // Wrap stream in a extracting (decompressing) stream if file ends with .deflate. - if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) { - return new InflaterInputStreamFSInputWrapper(stream); + // Wrap stream in a extracting (decompressing) stream if file ends with a known compression file extension. + InflaterInputStreamFactory inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath()); + if (inflaterInputStreamFactory != null) { + return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream)); --- End diff -- Ah, okay, I see. I didn't read the code closely enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user sekruse commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31562256 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException { * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper */ protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit fileSplit) throws Throwable { - // Wrap stream in a extracting (decompressing) stream if file ends with .deflate. - if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) { - return new InflaterInputStreamFSInputWrapper(stream); + // Wrap stream in a extracting (decompressing) stream if file ends with a known compression file extension. + InflaterInputStreamFactory inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath()); + if (inflaterInputStreamFactory != null) { + return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream)); --- End diff -- It might also be the case that the stream was not compressed at all. It would of course be nice to react appropriately to a missing codec, but how would we know if the current input split belongs to an uncompressed file or a compressed file with an unknown codec? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31560285 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws IOException { * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper */ protected FSDataInputStream decorateInputStream(FSDataInputStream inputStream, FileInputSplit fileSplit) throws Throwable { - // Wrap stream in a extracting (decompressing) stream if file ends with .deflate. - if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) { - return new InflaterInputStreamFSInputWrapper(stream); + // Wrap stream in a extracting (decompressing) stream if file ends with a known compression file extension. + InflaterInputStreamFactory inflaterInputStreamFactory = getInflaterInputStreamFactory(fileSplit.getPath()); + if (inflaterInputStreamFactory != null) { + return new InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream)); --- End diff -- so if there is no inflater input stream available, it will just fall back to the compressed data stream? Wouldn't it better to at least log something or fail? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/762#discussion_r31559688 --- Diff: flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java --- @@ -21,10 +21,16 @@ import java.io.IOException; import java.util.ArrayList; import java.util.Arrays; +import java.util.HashMap; import java.util.HashSet; import java.util.List; +import java.util.Map; import java.util.Set; +import org.apache.commons.lang3.Validate; --- End diff -- I'm really sorry that you ran into this, but the community recently decided to use Guava's Preconditions.check() instead of commons lang. Can you replace that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1981] add support for GZIP files
GitHub user sekruse opened a pull request: https://github.com/apache/flink/pull/762 [FLINK-1981] add support for GZIP files * register decompression algorithms with file extensions for extensibility * fit deflate decompression into this scheme * add support for GZIP files * test support for deflate and GZIP files with the CsvInputFormat You can merge this pull request into a Git repository by running: $ git pull https://github.com/sekruse/flink FLINK-1981 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/762.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #762 commit 6acae7faa4e27837ce3c9272d4310ec6c46895ab Author: Sebastian Kruse Date: 2015-06-02T16:58:35Z [FLINK-1981] add support for GZIP files * register decompression algorithms with file extensions for extensibility * fit deflate decompression into this scheme * add support for GZIP files * test support for deflate and GZIP files with the CsvInputFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---