At least in SQL, char(n) is a fixed-length string, but it means fixed number of 
characters. Since strings are typically UTF8, it is still a variable number of 
bytes...
So, I don't see how a string column can be stored in FLBA, even if it has a 
fixed number of characters (unless less common cases like an 8-byte encoding 
like a specific ASCII character set)
Just my two cents,
   Ofir


________________________________
From: Gang Wu <ust...@gmail.com>
Sent: Tuesday, June 18, 2024 5:20 PM
To: dev@parquet.apache.org <dev@parquet.apache.org>
Subject: [External] Re: [DISCUSS] Can FIXED_LEN_BYTE_ARRAY be annotated with 
STRING?

I have the same feeling and that's why I've asked in the mentioned PR.
It seems FLBA is just a special case of BYTE_ARRAY.

On Tue, Jun 18, 2024 at 10:16 PM Alkis Evlogimenos
<alkis.evlogime...@databricks.com.invalid> wrote:

> I don't see why it shouldn't be supported. FBLA and String are orthogonal
> features. The first optimizes encoding by not storing lengths and the
> latter says the binary is valid UTF8.
>
> On Tue, Jun 18, 2024 at 8:35 AM Gang Wu <ust...@gmail.com> wrote:
>
> > FYI, both parquet-cpp [1] and parquet-java [2] do not allow FLBA.
> >
> > [1]
> >
> >
> https://github.com/apache/arrow/blob/eec6f17c8879b469dc3370dad4a7f68f11705a6b/cpp/src/parquet/types.cc#L829-L842
> > [2]
> >
> >
> https://github.com/apache/parquet-java/blob/fbe13d89ae4193be12c164d4bb5342c5eba3963f/parquet-column/src/main/java/org/apache/parquet/schema/Types.java#L443-L447
> >
> > Best,
> > Gang
> >
> > On Tue, Jun 18, 2024 at 11:53 AM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> > > >
> > > > My instinct says "No", but others may have a different
> interpretation.
> > >
> > >
> > > This is also my instinct, I think we should check validation in
> > > Parquet-java and parquet-cpp to see if they are in agreement on the
> > matter
> > > and then make a decision from there.  It doesn't seem too onerous to
> > > support FLBA as a String though if necessary?
> > >
> > > Cheers,
> > > Micah
> > >
> > > On Mon, Jun 17, 2024 at 12:15 PM Ed Seidl <etse...@live.com> wrote:
> > >
> > > > Hi all,
> > > > While discussing PARQUET-2485 a question was raised about the STRING
> > > > annotation [1]. The current wording in the specification is "|STRING|
> > > > may only be used to annotate the binary primitive type"; PARQUET-2485
> > > > would change that to "|STRING| may only be used to annotate the
> > > > |BYTE_ARRAY| primitive type". The question is, can
> FIXED_LEN_BYTE_ARRAY
> > > > also be annotated with STRING? My instinct says "No", but others may
> > > > have a different interpretation.
> > > >
> > > > Are there any strong opinions in the community? Are there any
> > > > implementations that allow fixed length strings?
> > > >
> > > > Thanks,
> > > > Ed
> > > >
> > > > [1]
> > > >
> > https://github.com/apache/parquet-format/pull/251#discussion_r1635669939
> > > >
> > >
> >
>

Reply via email to