Any update on this? If you can send me a link to the IP clearance process and the guidelines and development practices for Apache repositories, I can notify the other stakeholders in the EEF and start the transfer process.
-- bp On Wed, 20 Aug, 2025, 1:51 pm Benjamin Philip, <[email protected]> wrote: > On Wed, 20 Aug 2025 at 04:08, Jacob Wujciak <[email protected]> wrote: > >> > Secondly, this will be the first time I will be maintaining an Apache >> > project, and I am not very familiar with the internal processes you >> use. I feel I might >> > move faster with a repo under my own user >> >> This does sound like it might be another use case for the 'arrow-contrib' >> org: >> Apache Datafusion has a community run, non-apache org called >> 'datafusion-contrib' [1], where unofficial extensions and datafusion >> related crates are developed. Once a project is mature/used enough it >> can be donated to the ASF Datafusion TLP (so that is not a necessity). >> This was for example done for Datafusion for Ray [2]. Though >> apparently it will now be archived due to a lack of maintenance [3]. >> (So maybe not the best example xD) >> >> The idea of creating a similar org for arrow has been brought up a >> number of times in the community meeting, This would not come with the >> 'red tape' of an ASF project and would allow faster initial >> development for the Erlang implementation. >> >> > That sounds like a good option. However, I don't want to eliminate > developing this as an ASF project from the start. I figure that this will > eventually become a regular ASF project, so I might as well get accustomed > to it now. Is there a document with all the "red tape" an ASF project > entails? > > If we were to do this, would the Erlang implementation be considered > "official" and linked from the docs? I would like to improve awareness of > the project, and I'd prefer it be mentioned in the official docs even as an > alpha release. I think that is important in addition to promoting it on > Elixir/Erlang specific channels. > > I also forgot to mention this in my previous email, but would any Arrow > maintainer be able to review PRs to this project, maybe multiple times a > week? I remember having many arrow specific doubts while working on this, > and I think it would be wise to have someone re-check my work to ensure I > haven't misinterpreted anything in the specifications and generally keep an > eye from the Apache side. I also have 2 other reviewers from the Erlang > Ecosystem Foundation reviewing my Erlang code, so that part is already > taken care of. > > Regarding the ip clearance process (that as you say will need to >> happen at some point of moving the implementation into >> apache/arrow-erlang), IIRC as long as the code has always been >> licensed under ASL 2.0 the process is more of a formality and >> shouldn't be too hard to do. >> > > The code is indeed licensed under ASL 2.0, so I think we can go with the > ip clearance process then. Are there any other legal matters that need to > be addressed? > > On Tue, 19 Aug 2025 at 14:09, Antoine Pitrou <[email protected]> wrote: > >> There isn't an official criterion for declaring an implementation >> "complete" (and we don't really use that term, either). >> >> What is important is to address the most common needs that your users >> may have (such as OpenTelemetry data payloads). > > > That makes sense. > > >> I would personally suggest: >> >> - support the most common data types (all primitive types + at least >> list and struct + dictionary + basic support for extension types) >> - support either the C Data Interface or the IPC format (preferably both) >> >> In the IPC format, you don't need to support everything (tensors are >> rarely used, for example; endianness conversion is only useful if you >> plan to exchange data with big-endian systems...). >> >> > As of right now, we support about half of all primitive types and most of > the lists (under nested types), but none of the special or extension types. > We also have some rudimentary support for IPC (since that's needed for > OTel). I plan to add support for everything under the Columnar Format > anyway, so it's just a matter of time. Is Flight and friends handled by the > Arrow team? How often and where is Flight used? > > Hi Benjamin, >> >> Le 14/08/2025 à 20:17, Benjamin Philip a écrit : >> > >> >> serialization/deserialization features but arrow-rs provides >> >> more features such as computation features. >> > >> > This reminds me. What features will I have to support out of >> > (de)serialization >> > for an implementation to be considered complete? >> >> You're probably aware of https://arrow.apache.org/docs/dev/status.html , >> otherwise it will give you an idea of the variety of features that *can* >> be implemented. >> > > This list only lists support for serialization and deserialization of > various data types, whether that be the Columnar Format, the IPC Format or > Flight. I realize that the words "out of" weren't very clear, but what I > meant was what should I support *apart from* serde? For example, Sutou > mentioned computation. I don't see a list of supported computations > anywhere, what computations must I provide? I'm guessing serde (i.e. R/W of > Arrow arrays) and computations (i.e. transformations of Arrow arrays) are > it, but are there any other high-level features I should support? > > -- bp >
