Keys are not concatenated to produce different field names for nested
objects. In the original example, each “a” should be encoded in the value
object using the same ID from the metadata/dictionary.

Strings in the dictionary may be duplicated only if the dictionary is not
sorted. From the spec
<https://github.com/apache/parquet-format/blob/a084f844f8475e5ff190fa367815e5ef00dbe08f/VariantEncoding.md#:~:text=If%20sorted_strings%20is%20set%20to%201%2C%20strings%20in%20the%20dictionary%20must%20be%20unique%20and%20sorted%20in%20lexicographic%20order.%20If%20the%20value%20is%20set%20to%200%2C%20readers%20may%20not%20make%20any%20assumptions%20about%20string%20order%20or%20uniqueness.>
:

If sorted_strings is set to 1, strings in the dictionary must be unique and
sorted in lexicographic order. If the value is set to 0, readers may not
make any assumptions about string order or uniqueness.


On Tue, May 13, 2025 at 2:59 AM Andrew Lamb <[email protected]> wrote:

> I think you can potentially use the example binary data here[1] to answer
> these question, specifically [2] and [3]
>
> I don't think the keys are concatenated with parent key names.
>
> Andrew
>
> [1]: https://github.com/apache/parquet-testing/tree/master/variant
> [2]:
>
> https://github.com/apache/parquet-testing/blob/master/variant/object_nested.metadata
> [3]:
>
> https://github.com/apache/parquet-testing/blob/master/variant/object_nested.value
>
>
> https://github.com/apache/parquet-testing/issues/75
>
> On Tue, May 13, 2025 at 4:37 AM Gang Wu <[email protected]> wrote:
>
> > quick question: how to serialize keys in the nested objects? Do we need
> to
> > concatenate its parent key like the json path?
> >
> > On Tue, May 13, 2025 at 3:19 PM wish maple <[email protected]>
> wrote:
> >
> > > Just to make sure if it's ok or this should be forbidden. Since it
> > > affect how reader/writer handles this
> > >
> > > Best,
> > > Xuwei Fu
> > >
> > > Aihua Xu <[email protected]> 于2025年5月13日周二 14:32写道:
> > >
> > > > It should be just single ‘a’ to reduce the storage by reusing the
> same
> > > > key. Any reason that we want to keep both ‘a’ there?
> > > >
> > > >
> > > >
> > > > > On May 12, 2025, at 7:43 PM, wish maple <[email protected]>
> > > wrote:
> > > > >
> > > > > Thanks! So, in the nested object scenario, would the metadata be
> > > > > field 0: "a", field 1: "a" or just field 0: "a"
> > > > > do the both way is ok for reader/writer, or we need limit the
> > > > > metadata implementation?
> > > > >
> > > > > Best,
> > > > > Xuwei Fu
> > > > >
> > > > > Ryan Blue <[email protected]> 于2025年5月13日周二 04:05写道:
> > > > >
> > > > >> Keys may appear in nested objects, but cannot appear in the same
> > > > object. So
> > > > >> the first example, {"a": {"a": 1}} is allowed. The second example,
> > > > {"a": 1,
> > > > >> "a": 2} is not allowed.
> > > > >>
> > > > >> Ryan
> > > > >>
> > > > >>> On Sun, May 11, 2025 at 11:47 PM wish maple <
> > [email protected]>
> > > > >>> wrote:
> > > > >>>
> > > > >>> In the Parquet variant spec, metadata part says that
> > > > >>>
> > > > >>>> Object: An unordered collection of string/Variant pairs (i.e.
> > > > key/value
> > > > >>> pairs). An object may not contain duplicate keys. [1]
> > > > >>>
> > > > >>> Considering a nested json object like {"a": {"a": 1}}, would the
> > > > metadata
> > > > >>> like field 0: "a", field 1: "a" or just field 0: "a" , or both of
> > > them
> > > > is
> > > > >>> ok for reader/writer?
> > > > >>>
> > > > >>> And besides, would duplicate keys be allowed in the same object?
> > Like
> > > > >> {"a":
> > > > >>> 1, "a": 2}?
> > > > >>>
> > > > >>> Best, Xuwei Fu
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > >
> > https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Reply via email to