[ https://issues.apache.org/jira/browse/BEAM-6479?focusedWorklogId=258216&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-258216 ]
ASF GitHub Bot logged work on BEAM-6479: ---------------------------------------- Author: ASF GitHub Bot Created on: 11/Jun/19 23:36 Start Date: 11/Jun/19 23:36 Worklog Time Spent: 10m Work Description: sini commented on issue #8418: [BEAM-6479] Deprecate AvroIO.RecordFormatter URL: https://github.com/apache/beam/pull/8418#issuecomment-501063020 Question, as I use this... in my FileIO dynamic write we have a mixed PCollection of hundreds of different types of binary avro records that we re-serialize using this sink. Would materializing an intermediate PCollection of GenericRecords, as is the proposed alternative, not increase the size and cost of shuffle operations if the collection needed to be flushed to disk as the schema is duplicated for every record? Or was this also a problem with the RecordFormatter transform step and I was just oblivious to it? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 258216) Time Spent: 0.5h (was: 20m) > Deprecate AvroIO.RecordFormatter > -------------------------------- > > Key: BEAM-6479 > URL: https://issues.apache.org/jira/browse/BEAM-6479 > Project: Beam > Issue Type: Task > Components: io-java-avro > Affects Versions: 2.9.0 > Reporter: Romain Manni-Bucau > Assignee: Ismaël Mejía > Priority: Major > Fix For: 2.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > AvroIO.RecordFormatter is an user friendly way to transform user elements > into Avro GenericRecords before writing to a Sink. This can be achieved > easily by doing a ParDo with the same goal and using Sink that knows how to > write a PCollection of IndexedRecords. Like the one proposed in BEAM-6480 -- This message was sent by Atlassian JIRA (v7.6.3#76005)