[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-08 Thread sekruse
Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-110013765
  
Okay, will do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/762


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-08 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-109912571
  
Thank you for your contribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-08 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-109907044
  
Thanks for the documentation. Could you open a JIRA to account for the 
necessary changes in terms of extensibility?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-05 Thread sekruse
Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-109385326
  
Okay, I did not further explain the internals but only how to employ the 
deflate and GZip support. I think, to make compression extensible or 
customizable (which would be worthwhile in my opinion), we should make small 
changes to the code wrt. to its usability. That however does not match the 
contents of the associated JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-04 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108845535
  
You can modify the documentation in the `docs/apis/programming_guide.md` 
file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-04 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108845395
  
I'm talking about the user documentation. You could mention support for 
gzip and add an example here: 
http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sources


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-04 Thread sekruse
Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108844255
  
Sure, I can do that. Do you talk about a user documentation or more Java 
docs. And if the former applies, where would I put that documentation 
preferrably?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-04 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108812916
  
:+1: This has been requested multiple times now. I would merge your pull 
request. Can you add some documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-03 Thread sekruse
Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/762#issuecomment-108443527
  
I exchanged that part with the Validate with Preconditions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-02 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31562955
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

Ah, okay, I see. I didn't read the code closely enough.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-02 Thread sekruse
Github user sekruse commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31562256
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

It might also be the case that the stream was not compressed at all. It 
would of course be nice to react appropriately to a missing codec, but how 
would we know if the current input split belongs to an uncompressed file or a 
compressed file with an unknown codec?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-02 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31560285
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -628,9 +692,10 @@ public void open(FileInputSplit fileSplit) throws 
IOException {
 * @see org.apache.flink.api.common.io.InputStreamFSInputWrapper
 */
protected FSDataInputStream decorateInputStream(FSDataInputStream 
inputStream, FileInputSplit fileSplit) throws Throwable {
-   // Wrap stream in a extracting (decompressing) stream if file 
ends with .deflate.
-   if (fileSplit.getPath().getName().endsWith(DEFLATE_SUFFIX)) {
-   return new InflaterInputStreamFSInputWrapper(stream);
+   // Wrap stream in a extracting (decompressing) stream if file 
ends with a known compression file extension.
+   InflaterInputStreamFactory inflaterInputStreamFactory = 
getInflaterInputStreamFactory(fileSplit.getPath());
+   if (inflaterInputStreamFactory != null) {
+   return new 
InputStreamFSInputWrapper(inflaterInputStreamFactory.create(stream));
--- End diff --

so if there is no inflater input stream available, it will just fall back 
to the compressed data stream?
Wouldn't it better to at least log something or fail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-02 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/762#discussion_r31559688
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/FileInputFormat.java ---
@@ -21,10 +21,16 @@
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 
+import org.apache.commons.lang3.Validate;
--- End diff --

I'm really sorry that you ran into this, but the community recently decided 
to use Guava's Preconditions.check() instead of commons lang.
Can you replace that?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1981] add support for GZIP files

2015-06-02 Thread sekruse
GitHub user sekruse opened a pull request:

https://github.com/apache/flink/pull/762

[FLINK-1981] add support for GZIP files

* register decompression algorithms with file extensions for extensibility
* fit deflate decompression into this scheme
* add support for GZIP files
* test support for deflate and GZIP files with the CsvInputFormat

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sekruse/flink FLINK-1981

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #762


commit 6acae7faa4e27837ce3c9272d4310ec6c46895ab
Author: Sebastian Kruse 
Date:   2015-06-02T16:58:35Z

[FLINK-1981] add support for GZIP files

* register decompression algorithms with file extensions for extensibility
* fit deflate decompression into this scheme
* add support for GZIP files
* test support for deflate and GZIP files with the CsvInputFormat




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---