Hi, I'm the original author of the Debian packages for Debian. I'm positive that Apache Arrow package exists in the official Debian repository.
> I do have a working package based on the JFrog packaging groundwork [0] > but had to make various changes mostly to avoid downloading dependencies > from the Internet (which is not allowed during the Debian build > process). So, mostly setting -DARROW_DEPENDENCY_SOURCE=SYSTEM and tuning > enabled/disabled features based on what we have and what we don't. > Result is at [1]. Could you create a "diff -ru" output between [0] and [1]? > The only exception here are ORC and S3 support, which are > missing because the ORC library [2] and the AWS C++ SDK > [3] are not packaged yet. Do you have a plan to package them? If they exist in the official Debian repository, we can use them. > 1.) Would somebody from the upstream team be interested in collaborating > to keep Arrow maintained in Debian? I would be able to review updates > and sponsor uploads. I'm interested in it. How about the following way? 1. You open pull requests for each your improvement to https://github.com/apache/arrow/ . 2. We mention you on GitHub when we open a pull request that is related to Debian packages such as https://github.com/apache/arrow/pull/10514 . 3. You upload our Debian package to the official Debian repository when we release a new version. You can notice a new release on this mailing list. > 2.) One quite scary thing left is documenting all copyright and license > occurrences in the codebase. It looks like there is a fair bit of > embedded code coming from various sources and with varying levels of > modification. The debian/copyright file in the JFrog packaging only > contains a number of TODOs so I guess this is still up to me to finish > before I can think of doing an upload. I think so too. > Is the LICENSE.txt in the Arrow source root directory complete and lists > _all_ third-party licenses and copyright holders in the release tarball? No. Most of them are covered but some of them only exists in source code such as https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/mman.h . I put many TODOs to debian/changelog because of this... Thanks, -- kou In <a6720063-aa49-e7b1-8124-b1ec2d4a6...@debian.org> "Debian packaging for Arrow" on Fri, 11 Jun 2021 11:26:30 +0200, Sascha Steinbiss <sa...@debian.org> wrote: > Hi Arrow community! > > I am a Debian Developer looking to package Arrow officially in Debian as > a dependency for a specific tool I want to get into Debian as well. > > I do have a working package based on the JFrog packaging groundwork [0] > but had to make various changes mostly to avoid downloading dependencies > from the Internet (which is not allowed during the Debian build > process). So, mostly setting -DARROW_DEPENDENCY_SOURCE=SYSTEM and tuning > enabled/disabled features based on what we have and what we don't. > Result is at [1]. > > It looks like I can build all packages built by the JFrog packaging with > no problems (at least for amd64). Build log attached. The only exception > here are ORC and S3 support, which are missing because the ORC library > [2] and the AWS C++ SDK [3] are not packaged yet. But apart from that it > looks like everything works. > > Just so you know, nothing has been officially uploaded yet. The package > is still in preparation and only used internally within my organization > so far. > > Being quite far in the packaging process, I have some questions: > > 1.) Would somebody from the upstream team be interested in collaborating > to keep Arrow maintained in Debian? I would be able to review updates > and sponsor uploads. > > 2.) One quite scary thing left is documenting all copyright and license > occurrences in the codebase. It looks like there is a fair bit of > embedded code coming from various sources and with varying levels of > modification. The debian/copyright file in the JFrog packaging only > contains a number of TODOs so I guess this is still up to me to finish > before I can think of doing an upload. > Is the LICENSE.txt in the Arrow source root directory complete and lists > _all_ third-party licenses and copyright holders in the release tarball? > If so, I could use it as a template and just reformat it as required by > Debian? That would be nice to know, otherwise that would mean a lot of > digging and probably still missing something. Missed license or > copyright holder mentions are the most common reason why new packages > are rejected during the initial, mandatory manual review for new > packages, BTW, so I'd like to avoid unnecessary review iterations ;) > > Thanks! > > Best regards > Sascha > > [0] > https://apache.jfrog.io/artifactory/arrow/debian/pool/bullseye/main/a/apache-arrow/apache-arrow_4.0.0-1.debian.tar.xz > [1] https://salsa.debian.org/satta/arrow/-/tree/master/debian > [2] https://github.com/apache/orc > [3] https://github.com/aws/aws-sdk-cpp >