Re: [DISCUSS] INT96 stats

Andrew Lamb Wed, 25 Jun 2025 11:41:39 -0700

We had a good discussion about this at the sync today.  Here is my summary

* Pedantically, according to the current spec[1] there is no defined
ordering for Int96 types and thus arrow-rs can not be writing "incorrect"
values (as there is no definition of correct)
* Practically speaking, arrow-rs is writing something different than Photon
(Databricks proprietary spark engine)
* What Photon is doing arguably makes more sense (to use the ordering of
the only logical type to use Int96)
* GH-7686: [Parquet] Fix int96 min/max stats #7687[2] brings arrow-rs into
line with Photon which makes sense to me


Rahul has also filed a ticket in parquet-format to discuss formalizing the
ordering of Int96 statistics[3]

In the interim, I filed a PR[4] in the parquet-format repo to at least try
and clarify the intent of the changes to arrow-rs and parquet-java

Thanks,
Andrew


[1]:
https://github.com/apache/parquet-format/blob/cf943c197f4fad826b14ba0c40eb0ffdab585285/src/main/thrift/parquet.thrift#L1079
[2]: https://github.com/apache/arrow-rs/pull/7687
[3]: https://github.com/apache/parquet-format/issues/502
[4]: https://github.com/apache/parquet-format/pull/504


On Wed, Jun 25, 2025 at 10:52 AM Rahul Sharma
<[email protected]> wrote:

> I have prepared a doc
> <
> https://docs.google.com/document/d/1Ox0qHYBgs_3-pNqn9V8zVQm_W6qP0lsbd2XwQnQVz1Y/edit?tab=t.0
> >
> to summarize and have all the relevant links in one place.
>
> On Wed, Jun 25, 2025 at 1:32 PM Alkis Evlogimenos
> <[email protected]> wrote:
>
> > Spark needs to start writing INT64 nanos first to be able to replace
> INT96
> > which is in nanos if data is at nano granularity. This is why I linked
> that
> > ticket which is a prerequisite to switching to INT64 in many cases.
> >
> > I understand the concerns around changing a deprecated aspect of the
> > parquet spec. The reason we decided to bring this forward is because:
> > 1. there are a lot of parquet files with the right INT96 stats outthere
> > (Photon has been writing them for years)
> > 2. all engines ignore the INT96 stats so Photon writing them didn't break
> > anyone
> > 3. Spark is (slowly) moving away from INT96
> > 4. our change is very narrow, backwards compatible and can improve
> current
> > workloads while (3) is ongoing
> >
> > Let's discuss more at the sync tonight.
> >
> > > If we are going to standardize an ordering for INT96, rather than
> parsing
> > "created_by" fields, wouldn't it make more sense to add a new ColumnOrder
> > value (like what's proposed for PARQUET-2249 [1])? Then we don't need to
> > maintain a list of known good writers.
> >
> > We do not have to add another ColumnOrder value since INT96 is a
> *physical*
> > type and can only take timestamps in the specified format. This was
> > arguably a design wart as it should have been a FIXED_LEN_BYTE_ARRAY(12)
> > with logical type INT96_TIMESTAMP, for which a different ColumnOrder
> would
> > make sense. In this case we are lucky this is a physical type without
> > logical type attached because otherwise, we couldn't have made this
> change
> > in a backwards compatible way as easily.
> >
> > On Sat, Jun 21, 2025 at 12:57 AM Ed Seidl <[email protected]> wrote:
> >
> > > If we are going to standardize an ordering for INT96, rather than
> parsing
> > > "created_by" fields, wouldn't it make more sense to add a new
> ColumnOrder
> > > value (like what's proposed for PARQUET-2249 [1])? Then we don't need
> to
> > > maintain a list of known good writers.
> > >
> > > Ed
> > >
> > > [1] https://github.com/apache/parquet-format/pull/221
> > >
> > > On 2025/06/19 10:15:13 Andrew Lamb wrote:
> > > > > While INT96 is now deprecated, it's still the default timestamp
> type
> > in
> > > > > Spark, resulting in a significant amount of existing data written
> in
> > > this
> > > > > format.
> > > >
> > > > I agree with Gang and Antoine that the better solution is to change
> > Spark
> > > > to write non deprecated parquet data types.
> > > >
> > > > It seems there is an issue in the Spark JIRA to do this[1] but the
> only
> > > > feedback on the associated PR [2] is that it is a breaking change.
> > > >
> > > > If Spark is going to keep writing INT96 timestamps indefinitely, I
> > > suggest
> > > > we un-deprecate the INT96 timestamps to reflect the ecosystem reality
> > > that
> > > > they will be here for a while rather than pretending they are really
> > > > deprecated.
> > > >
> > > > Andrew
> > > >
> > > > [1]: https://issues.apache.org/jira/browse/SPARK-51359
> > > > [2]:
> > https://github.com/apache/spark/pull/50215#issuecomment-2715147840
> > > >
> > > > p.s. as an aside, is anyone from DataBricks pushing spark to change
> > > > timestamp type? Or will the focus be to  improve INT96 timestamps
> > > instead?
> > > >
> > > >
> > > > On Wed, Jun 18, 2025 at 10:50 PM Gang Wu <[email protected]> wrote:
> > > >
> > > > > It seems not adding too much value to improve a deprecated feature
> > > > > especially
> > > > > when there are abundant Parquet implementations in the wild. IIRC,
> > > > > parquet-java
> > > > > is planning to release 1.16.0 for new data types like variant and
> > > geometry.
> > > > > It is
> > > > > also the last version to support Java 8. All deprecated APIs might
> > get
> > > > > removed
> > > > > from 2.0.0 so I'm not sure if older Spark versions are able to
> > > leverage the
> > > > > int96
> > > > > stats. The right way to go is to push forward the adoption of
> > timestamp
> > > > > logical
> > > > > types.
> > > > >
> > > > > Best,
> > > > > Gang
> > > > >
> > > > > On Thu, Jun 19, 2025 at 12:31 AM Micah Kornfield <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Alkis,
> > > > > > Is this the right thread link?  It seems to be a discussion on
> > > Timestamp
> > > > > > Nano support (which IIUC won't use int96, but I'm not sure this
> > > covers
> > > > > > changing the behavior for existing timestamps, which I think are
> at
> > > > > either
> > > > > > millisecond or microsecond granularity)?
> > > > > >
> > > > > > there will be customers that want to interface with legacy
> systems
> > > > > > > with INT96. This is why we decided in doing both.
> > > > > >
> > > > > >
> > > > > > It might help to elaborate on the time-frame here.  Since it
> > appears
> > > > > > reference implementations of parquet are not currently writing
> > > > > statistics,
> > > > > > if we merge these changes when they will be picked up in Spark?
> > > Would the
> > > > > > plan be to backport the parquet-java to older version of Spark
> > > (otherwise
> > > > > > the legacy systems wouldn't really make use or emit stats
> anyways)?
> > > What
> > > > > > is the delta between Spark picking up these changes and
> > > transitioning off
> > > > > > of Int96 by default?   Is the expectation that even once the
> > default
> > > is
> > > > > > changed in spark to not use int96, there will be a large number
> of
> > > users
> > > > > > that will override the default to write int96?
> > > > > >
> > > > > > Thanks,
> > > > > > Micah
> > > > > >
> > > > > > On Wed, Jun 18, 2025 at 1:35 AM Alkis Evlogimenos
> > > > > > <[email protected]> wrote:
> > > > > >
> > > > > > > We are also driving that in parallel:
> > > > > > >
> https://lists.apache.org/thread/y2vzrjl1499j5dvbpg3m81jxdhf4b6of
> > .
> > > > > > >
> > > > > > > Even when Spark defaults to INT64 there will be old versions of
> > > Spark
> > > > > > > running, there will be customers that want to interface with
> > legacy
> > > > > > systems
> > > > > > > with INT96. This is why we decided in doing both.
> > > > > > >
> > > > > > > On Wed, Jun 18, 2025 at 9:53 AM Antoine Pitrou <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Can we get Spark to stop emitting INT96? They are not being
> an
> > > > > > > > extremely good community player here.
> > > > > > > >
> > > > > > > > Regards
> > > > > > > >
> > > > > > > > Antoine.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, 13 Jun 2025 15:17:51 +0200
> > > > > > > > Alkis Evlogimenos
> > > > > > > > <[email protected]>
> > > > > > > > wrote:
> > > > > > > > > Hi folks,
> > > > > > > > >
> > > > > > > > > While INT96 is now deprecated, it's still the default
> > timestamp
> > > > > type
> > > > > > in
> > > > > > > > > Spark, resulting in a significant amount of existing data
> > > written
> > > > > in
> > > > > > > this
> > > > > > > > > format.
> > > > > > > > >
> > > > > > > > > Historically, parquet-mr/java has not emitted or read
> > > statistics
> > > > > for
> > > > > > > > INT96.
> > > > > > > > > This was likely due to the fact that standard byte
> comparison
> > > on
> > > > > the
> > > > > > > > INT96
> > > > > > > > > representation doesn't align with logical comparisons,
> > > potentially
> > > > > > > > leading
> > > > > > > > > to incorrect min/max values. This is unfortunate because
> > > timestamp
> > > > > > > > filters
> > > > > > > > > are extremely common and lack of stats limits optimization
> > > > > > > opportunities.
> > > > > > > > >
> > > > > > > > > Since its inception Photon <
> > > > > > https://www.databricks.com/product/photon>
> > > > > > > > emitted
> > > > > > > > > and utilized INT96 statistics by employing a logical
> > > comparator,
> > > > > > > ensuring
> > > > > > > > > their correctness. We have now implemented
> > > > > > > > > <https://github.com/apache/parquet-java/pull/3243> the
> same
> > > > > support
> > > > > > > > within
> > > > > > > > > parquet-java.
> > > > > > > > >
> > > > > > > > > We'd like to get the community's thoughts on this addition.
> > We
> > > > > > > anticipate
> > > > > > > > > that most users may not be directly affected due to the
> > > declining
> > > > > use
> > > > > > > of
> > > > > > > > > INT96. However, we are interested in identifying any
> > potential
> > > > > > > drawbacks
> > > > > > > > or
> > > > > > > > > unforeseen issues with this approach.
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] INT96 stats

Reply via email to