Thanks for your response Felipe.

I didn't mean to suggest that different Arrow implementations using old
versions of these files is a problem. If they were in a shared repository
an implementation could of course reference them through a submodule that
points to an old commit and that's fine.

When developers want to add support for a new format feature they would
only then need to update the submodule to the required version. They might
only need a subset of all the changes in the shared repository, but that's
OK. As you've said, these formats must be kept backwards compatible, so any
changes related to features not yet supported by an Arrow implementation
might add new generated code, but this should be backwards compatible and
require no changes to existing consumer code.

My reason for suggesting this was more that it should simplify the process
of updating them when needed, rather than trying to enforce that they get
updated.

That said, I'm not sure the extra infrastructure complexity would be worth
it, and if people prefer the flexibility of having separate copies then I
think the status quo is fine.

Thanks,
Adam

On Wed, 20 Aug 2025 at 16:55, Felipe Oliveira Carvalho <[email protected]>
wrote:

> Unlike programming language type definitions, .proto (and FlatBuffers)
> files are artifacts that you ship to consumers of the API defined by them.
> You only append to these definitions. If a field is removed, its field
> index must never be used again, and so on.
>
> So one should not stress so much about syncing these files. It’s actually a
> good thing that a version in a specific arrow-LANG repository does not
> reflect all the latest developments of the spec but only what that specific
> implementation currently understands.
>
> Forcing a synchronization creates more problems than it solves. It’s a
> counter-intuitive conclusion because not many things in programming are
> designed with backwards compatibility in mind like these protocol
> definition languages.
>
> (For many years, I shipped mobile apps containing .proto definitions used
> for communication with servers and local persistence. Every week there’s a
> copy of the app on millions of phones with code generated from “outdated”
> .proto files. It works really well if you’re always keeping backwards
> compatibility in mind.)
>
> —
> Felipe
>
> On Wed, 20 Aug 2025 at 01:40 Adam Reeve <[email protected]> wrote:
>
> > Hi everyone,
> >
> > As part of creating the new arrow-dotnet repository, the contents of the
> > format directory from the main arrow repository had to be copied [1].
> This
> > contains language agnostic flatbuffer and protobuf definitions for the
> > Arrow IPC and Flight formats that can be used to generate code. Both the
> > arrow-rs [2] and arrow-java [3] repositories also contain copies of these
> > files that have to be manually updated when there are format changes.
> >
> > It appears that other implementations check in generated code rather than
> > generate code at build time, so don't need to store the original
> > definitions (at least arrow-go [4] and arrow-swift [5] do this, I haven't
> > looked closely at all implementations).
> >
> > I wonder whether it would simplify processes if there was a shared
> > arrow-format repository to store these files, which could be included as
> a
> > git submodule in other repositories, similar to how the arrow-testing and
> > parquet-testing repositories are used. This would make it easy to see
> > whether the format files are up to date, and prevent potential divergence
> > between implementations.
> >
> > On the other hand, these format files aren't updated frequently and git
> > submodules add extra developer friction. They aren't checked out by
> default
> > when cloning for example, and changes that cross repository boundaries
> > require extra coordination.
> >
> > What do people think of this idea? Would it be worth setting up a new
> > arrow-format repository?
> >
> > Thanks,
> > Adam
> >
> > [1]: https://github.com/apache/arrow-dotnet/pull/17
> > [2]: https://github.com/apache/arrow-rs/tree/main/format
> > [3]: https://github.com/apache/arrow-java/tree/main/arrow-format
> > [4]:
> >
> >
> https://github.com/apache/arrow-go/blob/a661aa4711c27a065907512c69bf2e9d3454b936/arrow/internal/flatbuf/Schema.go#L17
> > [5]:
> >
> >
> https://github.com/apache/arrow-swift/blob/99275981ac54ab25a9f308f6182acf571385bda6/Arrow/Sources/Arrow/Schema_generated.swift#L18
> >
>

Reply via email to