[ https://issues.apache.org/jira/browse/BEAM-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099182#comment-16099182 ]
ASF GitHub Bot commented on BEAM-2677: -------------------------------------- GitHub user jkff opened a pull request: https://github.com/apache/beam/pull/3632 [BEAM-2677] AvroIO.parseGenericRecords - schemaless AvroIO.read To be done properly, this PR needs https://github.com/apache/beam/pull/3549. R: @mairbek CC: @reuvenlax (please take a look at the API but hold off a full review until that PR is submitted) You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkff/incubator-beam avroio-dynamic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3632.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3632 ---- commit 8bcafd94a38d9ccb962acd809536ff4bfcf70036 Author: Eugene Kirpichov <kirpic...@google.com> Date: 2017-07-24T22:07:15Z [BEAM-2677] AvroIO.parseGenericRecords - schemaless AvroIO.read ---- > AvroIO.read without specifying a schema > --------------------------------------- > > Key: BEAM-2677 > URL: https://issues.apache.org/jira/browse/BEAM-2677 > Project: Beam > Issue Type: Bug > Components: sdk-java-core > Reporter: Eugene Kirpichov > Assignee: Eugene Kirpichov > > Sometimes it is inconvenient to require the user of AvroIO.read/readAll to > specify a Schema for the Avro files they are reading, especially if different > files may have different schemas. > It is possible to read GenericRecord objects from an Avro file, however it is > not possible to provide a Coder for GenericRecord without knowing the schema: > a GenericRecord knows its schema so we can encode it into a byte array, but > we can not decode it from a byte array without knowing the schema (and > encoding the full schema together with every record would be impractical). > Instead, a reasonable approach is to treat schemaless GenericRecord as > unencodable and use the same approach as JdbcIO - a user-specified parse > callback. > Suggested API: AvroIO.parseGenericRecords(SerializableFunction<GenericRecord, > T> parseFn).from(filepattern). > CC: [~mkhadikov] [~reuvenlax] -- This message was sent by Atlassian JIRA (v6.4.14#64029)