[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-12-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2060


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-09-09 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2060#discussion_r78203358
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/FieldParser.java ---
@@ -75,8 +76,30 @@
/** Invalid Boolean value **/
BOOLEAN_INVALID
}
-   
+
+   private Charset charset = Charset.forName("US-ASCII");
+
private ParseErrorState errorState = ParseErrorState.NONE;
+
+   /**
+* Parses the value of a field from the byte array.
+* The start position within the byte array and the array's valid 
length is given.
+* The content of the value is delimited by a field delimiter.
+*
+* @param bytes The byte array that holds the value.
+* @param startPos The index where the field starts
+* @param limit The limit unto which the byte contents is valid for the 
parser. The limit is the
+*  position one after the last valid byte.
+* @param delim The field delimiter character
+* @param reuse An optional reusable field to hold the value
+* @param charset The charset to parse with
+*
+* @return The index of the next delimiter, if the field was parsed 
correctly. A value less than 0 otherwise.
+*/
+   public int parseField(byte[] bytes, int startPos, int limit, byte[] 
delim, T reuse, Charset charset){
+   this.charset = charset;
--- End diff --

Is this method needed? `GenericCsvInputFormat.open` can `setCharset` on 
each newly instantiated `FieldParser`, and in the case where a user decided to 
change charset on an open file `GenericCsvInputFormat.setCharset` could go 
through and `setCharset` on the list of `FieldParser`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-09-09 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2060#discussion_r78199868
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/FieldParser.java ---
@@ -75,8 +76,30 @@
/** Invalid Boolean value **/
BOOLEAN_INVALID
}
-   
+
+   private Charset charset = Charset.forName("US-ASCII");
--- End diff --

Should this also default to `UFT-8`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-09-09 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2060#discussion_r78199594
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java
 ---
@@ -314,6 +320,25 @@ protected void setFieldsGeneric(boolean[] 
includedMask, Class[] fieldTypes) {
this.fieldIncluded = includedMask;
}
 
+   /**
+* Gets the Charset for the parser.Default is set to UTF-8
+*
+* @return The charset for the parser.
+*/
+   public Charset getCharset() {
+   return this.charset;
+   }
+
+   /**
+* Sets the charset of the parser. Called by subclasses of the parser 
to set the type of charset
+* when doing a parse.
+*
+* @param charset The charset  to set.
+*/
+   protected void setCharset(Charset charset){
--- End diff --

Add space between `)` and `{` (and I count two more instances).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-09-09 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2060#discussion_r78199365
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java
 ---
@@ -314,6 +320,25 @@ protected void setFieldsGeneric(boolean[] 
includedMask, Class[] fieldTypes) {
this.fieldIncluded = includedMask;
}
 
+   /**
+* Gets the Charset for the parser.Default is set to UTF-8
+*
+* @return The charset for the parser.
+*/
+   public Charset getCharset() {
+   return this.charset;
+   }
+
+   /**
+* Sets the charset of the parser. Called by subclasses of the parser 
to set the type of charset
+* when doing a parse.
+*
+* @param charset The charset  to set.
+*/
+   protected void setCharset(Charset charset){
+   this.charset = charset != null ? charset : 
Charset.forName("UTF-8");
--- End diff --

Use `Preconditions.checkNotNull`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-09-09 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2060#discussion_r78199262
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java
 ---
@@ -314,6 +320,25 @@ protected void setFieldsGeneric(boolean[] 
includedMask, Class[] fieldTypes) {
this.fieldIncluded = includedMask;
}
 
+   /**
+* Gets the Charset for the parser.Default is set to UTF-8
+*
+* @return The charset for the parser.
+*/
+   public Charset getCharset() {
+   return this.charset;
+   }
+
+   /**
+* Sets the charset of the parser. Called by subclasses of the parser 
to set the type of charset
+* when doing a parse.
+*
+* @param charset The charset  to set.
--- End diff --

Double space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-09-09 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2060#discussion_r78199216
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java
 ---
@@ -314,6 +320,25 @@ protected void setFieldsGeneric(boolean[] 
includedMask, Class[] fieldTypes) {
this.fieldIncluded = includedMask;
}
 
+   /**
+* Gets the Charset for the parser.Default is set to UTF-8
--- End diff --

Should we spell out `charset` as "character set" in the documentation? 
Also, space after period and missing period at end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-07-18 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2060#discussion_r71137136
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/api/common/io/GenericCsvInputFormat.java
 ---
@@ -106,6 +106,11 @@ protected GenericCsvInputFormat() {
protected GenericCsvInputFormat(Path filePath) {
super(filePath);
}
+
+   protected GenericCsvInputFormat(Path filePath, Charset charset) {
+   super(filePath);
+   this.charset = charset != null ? charset : 
Charset.forName("UTF-8");
--- End diff --

Would this be better as `this.charset = 
Preconditions.checkNotNull(charset);` since the user should use 
`GenericCsvInputFormat(Path)` when using the default charset?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2060: [FLINK-3921] StringParser encoding

2016-06-01 Thread rekhajoshm
GitHub user rekhajoshm opened a pull request:

https://github.com/apache/flink/pull/2060

[FLINK-3921] StringParser encoding

Corrected StringParser encoding

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rekhajoshm/flink FLINK-3921

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2060.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2060


commit 1e1ce9efefccf5a5585be802af4200e9d1ae7a98
Author: Rekha Joshi 
Date:   2016-06-01T20:03:50Z

Merge pull request #1 from apache/master

Apache Flink master pull

commit 675b6a44e76ae71901bc6d4eaea1d09b6f789ff6
Author: Joshi 
Date:   2016-06-01T20:26:47Z

[FLINK-3921] StringParser encoding




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---