[ 
https://issues.apache.org/jira/browse/BEAM-11091?focusedWorklogId=503773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-503773
 ]

ASF GitHub Bot logged work on BEAM-11091:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Oct/20 15:44
            Start Date: 22/Oct/20 15:44
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on a change in pull request #13166:
URL: https://github.com/apache/beam/pull/13166#discussion_r510266391



##########
File path: 
sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
##########
@@ -437,6 +445,12 @@
           .build();
     }
 
+    /** Transforms the keys read from the source using the given key 
translation function. */
+    public Read<K, V> withKeyTranslation(SimpleFunction<?, K> function, 
Coder<K> coder) {

Review comment:
       Since we don't need the type information anymore, we should use 
SerializableFunction instead of SimpleFunction.

##########
File path: 
sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
##########
@@ -437,6 +445,12 @@
           .build();
     }
 
+    /** Transforms the keys read from the source using the given key 
translation function. */
+    public Read<K, V> withKeyTranslation(SimpleFunction<?, K> function, 
Coder<K> coder) {

Review comment:
       You need to clear the keyCoder/valueCoder in the non coder based 
variants otherwise we won't honor the typedescriptor when the user changes the 
translation function




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 503773)
    Time Spent: 20m  (was: 10m)

> HadoopFormatIO should allow to specify coders
> ---------------------------------------------
>
>                 Key: BEAM-11091
>                 URL: https://issues.apache.org/jira/browse/BEAM-11091
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-hadoop-format
>    Affects Versions: 2.24.0
>            Reporter: Jozef Vilcek
>            Assignee: Jozef Vilcek
>            Priority: P3
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> HadoopFormatIO does allow only to pass in type descriptors for key and value 
> and those are then resolved to Coder via CoderRegistry.
> We did find this restrictive after 
> https://issues.apache.org/jira/browse/BEAM-9569
> which makes it impossible to emit Row out of the source.
> Easy solution I see can be to allow set Coder directly and bypass the 
> resolution against registry in that case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to