[
https://issues.apache.org/jira/browse/PARQUET-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303668#comment-14303668
]
Ryan Blue commented on PARQUET-113:
-----------------------------------
The solution to #1 should be that SparkSQL implements the proposed
compatibility rules from this ticket. We're working on moving all of the object
models to use those rules and Hive was the first one to do it. You can write
the older structures to get something working now, but I don't recommend it as
a long-term solution.
For maps, I'm not sure how protobuf encodes them. I didn't think protobuf had a
special map type, but relied instead on the caller using a repeated key-value
group instead (as you have here). There are currently no rules for encoding a
structure in protobuf that shows up as a map in the other object models. To do
that, we would either need to add a special case for the name (not ideal) or
find a way to signal to parquet-protobuf that you want an annotated map
structure. I prefer the second option, but I could be convinced to do the first
if there really is no way around it. This should be done in a follow-up ticket
so that the other rules aren't blocked.
For Spark other than SparkSQL, I think people mostly use the parquet-mr object
models via the {{newAPIHadoopRDD}} method. That's not ideal with Avro records,
but hopefully we fix that soon enough.
> Clarify parquet-format specification for LIST and MAP structures.
> -----------------------------------------------------------------
>
> Key: PARQUET-113
> URL: https://issues.apache.org/jira/browse/PARQUET-113
> Project: Parquet
> Issue Type: Bug
> Components: parquet-format, parquet-mr
> Reporter: Ryan Blue
> Assignee: Ryan Blue
>
> There are incompatibilities in the way that some parquet object models
> translate nested structures annotated by LIST and MAP / MAP_KEY_VALUE. We
> need to define clearly what the structures should look like and how to
> interpret existing structures, including what must be supported to read
> current parquet-avro, parquet-thrift, etc. files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)