[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

ASF GitHub Bot (JIRA) Thu, 05 Mar 2015 03:18:20 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348616#comment-14348616
 ]


ASF GitHub Bot commented on FLINK-1512:
---------------------------------------

Github user teabot commented on the pull request:

    https://github.com/apache/flink/pull/426#issuecomment-77347164
  
    I like that this request adds what I would consider to be a very useful 
feature. Data is often stored in text delimited formats and one of the nice 
features of Flink is the ability to work with POJOs. Therefore some mechanism 
to map from delimited files to POJOs is useful.
    
    However, I have found that such mappings are rarely restricted to the 
simple cases. Therefore might I suggest that some other factors be considered 
as part of this request so as not to limit the usefulness of and to provide 
some extensibility with this API change:
    
    * The position fields within the delimited file and how they are mapped to 
POJO properties could indeed be modelled with a `String[]` as suggested. 
However, this will be arduous to maintain and prone to human error for types 
with a large number of fields. An alternative could be to have a `@Position` 
annotation on the POJO object fields to indicate the CSV column index.
    ```
      public class MyPojo {
        @Position(0)
        public int id;
        
        @Position(1)
        public String name;
        ...
    ```
    * Flink + POJOs bring the benefit of richer types to data processing 
pipelines. But it seems to me that this implementation imposes a restriction on 
the range of types that can be used within the target POJO type negating this 
benefit somewhat. If I want to use a POJO with a `DateTime` field then I must 
still create my own mapping function to do so. Therefore It would be useful to 
provide some hook into the CSV-POJO mapping process to allow the specification 
of user declared/defined type converters. This can then enable users to easily 
map to field types of their choosing. Again, this could be modelled as an 
annotation:
    ```
      public class MyPojo {
        // DateTimeConverter implemented by the user
        @Converter(type=DateTimeConverter.class, 
properties="format=yyyy/MM/dd:HH;timezone=Europe/London")
        public DateTime id;
        ...
    ```
    You might consider that these additional features are better served by some 
separate type-mapping component or API (and I'd probably agree). But in that 
case is it then wise to also add a simpler, less flexible form to the core 
Flink API?
    
    Thank you for your time.


> Add CsvReader for reading into POJOs.
> -------------------------------------
>
>                 Key: FLINK-1512
>                 URL: https://issues.apache.org/jira/browse/FLINK-1512
>             Project: Flink
>          Issue Type: New Feature
>          Components: Java API, Scala API
>            Reporter: Robert Metzger
>            Assignee: Chiwan Park
>            Priority: Minor
>              Labels: starter
>
> Currently, the {{CsvReader}} supports only TupleXX types. 
> It would be nice if users were also able to read into POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

Reply via email to