[
https://issues.apache.org/jira/browse/DRILL-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550633#comment-13550633
]
Jacques Nadeau edited comment on DRILL-19 at 1/11/13 12:50 AM:
---------------------------------------------------------------
In the heterogeneous situation, you should just capture the array as type
heterogeneous and then encode the schema information with each element in the
array.
Random thought, what do you think about making your schema code output a proto
idl? For the heterogeneous array option, I'd use a type of repeated bytes
with the assumption that each bytes value will be the schema followed by the
data.
Yes. Not necessarily all way to a .proto definition. But map to those
concepts. Basically, proto is a schema definition language. You're working on
writing a schema extraction tool. The output should preferably be expressed as
a schema definition language. It seems like proto is a reasonable one to use.
That way you can spend less effort recreating it.
---
I was thinking about what my proto definition looks like when I have a list
with maps, and so on. I was thinking that I generate a message definition at
the parent level each map found in lists, however not sure what class name
choice I can use to guarantee no name clash.
---
I'd suggest for naming that we just carry an incrementing integer and then name
each message m##### such as m00001 and upwards.
was (Author: jnadeau):
In the heterogeneous situation, you should just capture the array as type
heterogeneous and then encode the schema information with each element in the
array.
Random thought, what do you think about making your schema code output a proto
idl? For the heterogeneous array option, I'd use a type of repeated bytes
with the assumption that each bytes value will be the schema followed by the
data.
> Build a JSON scanner that does schema discovery
> -----------------------------------------------
>
> Key: DRILL-19
> URL: https://issues.apache.org/jira/browse/DRILL-19
> Project: Apache Drill
> Issue Type: New Feature
> Reporter: Jacques Nadeau
> Assignee: Timothy Chen
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira