Hi,
I'm the original author of the Debian packages for Debian.
I'm positive that Apache Arrow package exists in the
official Debian repository.
> I do have a working package based on the JFrog packaging groundwork [0]
> but had to make various changes mostly to avoid downloading dependencies
> from the Internet (which is not allowed during the Debian build
> process). So, mostly setting -DARROW_DEPENDENCY_SOURCE=SYSTEM and tuning
> enabled/disabled features based on what we have and what we don't.
> Result is at [1].
Could you create a "diff -ru" output between [0] and [1]?
> The only exception here are ORC and S3 support, which are
> missing because the ORC library [2] and the AWS C++ SDK
> [3] are not packaged yet.
Do you have a plan to package them? If they exist in the
official Debian repository, we can use them.
> 1.) Would somebody from the upstream team be interested in collaborating
> to keep Arrow maintained in Debian? I would be able to review updates
> and sponsor uploads.
I'm interested in it. How about the following way?
1. You open pull requests for each your improvement
to https://github.com/apache/arrow/ .
2. We mention you on GitHub when we open a pull request
that is related to Debian packages such as
https://github.com/apache/arrow/pull/10514 .
3. You upload our Debian package to the official Debian
repository when we release a new version.
You can notice a new release on this mailing list.
> 2.) One quite scary thing left is documenting all copyright and license
> occurrences in the codebase. It looks like there is a fair bit of
> embedded code coming from various sources and with varying levels of
> modification. The debian/copyright file in the JFrog packaging only
> contains a number of TODOs so I guess this is still up to me to finish
> before I can think of doing an upload.
I think so too.
> Is the LICENSE.txt in the Arrow source root directory complete and lists
> _all_ third-party licenses and copyright holders in the release tarball?
No. Most of them are covered but some of them only exists in
source code such as
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/mman.h .
I put many TODOs to debian/changelog because of this...
Thanks,
--
kou
In <[email protected]>
"Debian packaging for Arrow" on Fri, 11 Jun 2021 11:26:30 +0200,
Sascha Steinbiss <[email protected]> wrote:
> Hi Arrow community!
>
> I am a Debian Developer looking to package Arrow officially in Debian as
> a dependency for a specific tool I want to get into Debian as well.
>
> I do have a working package based on the JFrog packaging groundwork [0]
> but had to make various changes mostly to avoid downloading dependencies
> from the Internet (which is not allowed during the Debian build
> process). So, mostly setting -DARROW_DEPENDENCY_SOURCE=SYSTEM and tuning
> enabled/disabled features based on what we have and what we don't.
> Result is at [1].
>
> It looks like I can build all packages built by the JFrog packaging with
> no problems (at least for amd64). Build log attached. The only exception
> here are ORC and S3 support, which are missing because the ORC library
> [2] and the AWS C++ SDK [3] are not packaged yet. But apart from that it
> looks like everything works.
>
> Just so you know, nothing has been officially uploaded yet. The package
> is still in preparation and only used internally within my organization
> so far.
>
> Being quite far in the packaging process, I have some questions:
>
> 1.) Would somebody from the upstream team be interested in collaborating
> to keep Arrow maintained in Debian? I would be able to review updates
> and sponsor uploads.
>
> 2.) One quite scary thing left is documenting all copyright and license
> occurrences in the codebase. It looks like there is a fair bit of
> embedded code coming from various sources and with varying levels of
> modification. The debian/copyright file in the JFrog packaging only
> contains a number of TODOs so I guess this is still up to me to finish
> before I can think of doing an upload.
> Is the LICENSE.txt in the Arrow source root directory complete and lists
> _all_ third-party licenses and copyright holders in the release tarball?
> If so, I could use it as a template and just reformat it as required by
> Debian? That would be nice to know, otherwise that would mean a lot of
> digging and probably still missing something. Missed license or
> copyright holder mentions are the most common reason why new packages
> are rejected during the initial, mandatory manual review for new
> packages, BTW, so I'd like to avoid unnecessary review iterations ;)
>
> Thanks!
>
> Best regards
> Sascha
>
> [0]
> https://apache.jfrog.io/artifactory/arrow/debian/pool/bullseye/main/a/apache-arrow/apache-arrow_4.0.0-1.debian.tar.xz
> [1] https://salsa.debian.org/satta/arrow/-/tree/master/debian
> [2] https://github.com/apache/orc
> [3] https://github.com/aws/aws-sdk-cpp
>