[
https://issues.apache.org/jira/browse/BEAM-6479?focusedWorklogId=258902&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-258902
]
ASF GitHub Bot logged work on BEAM-6479:
----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Jun/19 17:06
Start Date: 12/Jun/19 17:06
Worklog Time Spent: 10m
Work Description: sini commented on issue #8418: [BEAM-6479] Deprecate
AvroIO.RecordFormatter
URL: https://github.com/apache/beam/pull/8418#issuecomment-501063020
Question, as I use this... in my FileIO dynamic write we have a mixed
PCollection of hundreds of different types of binary avro records that we
re-serialize using this sink. Would materializing an intermediate PCollection
of GenericRecords, as is the proposed alternative, not increase the size and
cost of shuffle operations if the collection needed to be flushed to disk as
the schema is duplicated for every record? Or was this also a problem with the
RecordFormatter transform step and I was just oblivious to it?
Edit: After a day of further reflection I suppose the answer is I should
optimize this in the coder? If I use a KVCoder, I don't see a way to make the
value coder dependent on the keycoder. I suppose I would need to add some sort
of registry and an identifier to the serialized record value for external
lookup of schemas?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 258902)
Time Spent: 40m (was: 0.5h)
> Deprecate AvroIO.RecordFormatter
> --------------------------------
>
> Key: BEAM-6479
> URL: https://issues.apache.org/jira/browse/BEAM-6479
> Project: Beam
> Issue Type: Task
> Components: io-java-avro
> Affects Versions: 2.9.0
> Reporter: Romain Manni-Bucau
> Assignee: Ismaël Mejía
> Priority: Major
> Fix For: 2.13.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> AvroIO.RecordFormatter is an user friendly way to transform user elements
> into Avro GenericRecords before writing to a Sink. This can be achieved
> easily by doing a ParDo with the same goal and using Sink that knows how to
> write a PCollection of IndexedRecords. Like the one proposed in BEAM-6480
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)