[
https://issues.apache.org/jira/browse/TAJO-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961875#comment-13961875
]
David Chen commented on TAJO-711:
---------------------------------
I have an initial implementation of this done and have posted an RB. There are
still a few changes and validation work I would like to do before this is fully
ready:
* Test the use of {{avro.schema.url}}. Currently, the tests only test
{{avro.schema.literal}}.
* Converting between Avro and Tajo is slightly tricky because data sets would
usually use the Avro schema as the "true" schema. I would like to do some more
validation to look for some more corner cases. For this ticket, I'll do a
best-effort validation with flat schemas, though Avro support might not be
truly battle-tested until TAJO-710 is done because most of our data here at
LinkedIn have nested schemas.
Something else we would want to look at is schema evolution across partitions.
I haven't looked too closely at TAJO-283 yet, but are we storing table
properties into the partitions? For example, say that partitions i...j are
created with Avro schema A, set by either the {{avro.schema.url}} or
{{avro.schema.literal}} property. Now, partitions j+1...k are created with an
evolved Avro schema A'. Does the current implementation of partitions in Tajo
support storing such properties within the partitions? In any event, if this
might an issue, we can create a separate ticket for this work.
> Add Avro storage support
> ------------------------
>
> Key: TAJO-711
> URL: https://issues.apache.org/jira/browse/TAJO-711
> Project: Tajo
> Issue Type: New Feature
> Reporter: David Chen
> Assignee: David Chen
> Attachments: TAJO-711.patch
>
>
> Add {{FileScanner}} and {{FileAppender}} for reading from and writing to Avro.
--
This message was sent by Atlassian JIRA
(v6.2#6252)