Hi,

I'm the original author of the Debian packages for Debian.
I'm positive that Apache Arrow package exists in the
official Debian repository.

> I do have a working package based on the JFrog packaging groundwork [0]
> but had to make various changes mostly to avoid downloading dependencies
> from the Internet (which is not allowed during the Debian build
> process). So, mostly setting -DARROW_DEPENDENCY_SOURCE=SYSTEM and tuning
> enabled/disabled features based on what we have and what we don't.
> Result is at [1].

Could you create a "diff -ru" output between [0] and [1]?

> The only exception here are ORC and S3 support, which are
> missing because the ORC library [2] and the AWS C++ SDK
> [3] are not packaged yet.

Do you have a plan to package them? If they exist in the
official Debian repository, we can use them.

> 1.) Would somebody from the upstream team be interested in collaborating
> to keep Arrow maintained in Debian? I would be able to review updates
> and sponsor uploads.

I'm interested in it. How about the following way?

  1. You open pull requests for each your improvement
     to https://github.com/apache/arrow/ .

  2. We mention you on GitHub when we open a pull request
     that is related to Debian packages such as
     https://github.com/apache/arrow/pull/10514 .

  3. You upload our Debian package to the official Debian
     repository when we release a new version.
     You can notice a new release on this mailing list.


> 2.) One quite scary thing left is documenting all copyright and license
> occurrences in the codebase. It looks like there is a fair bit of
> embedded code coming from various sources and with varying levels of
> modification. The debian/copyright file in the JFrog packaging only
> contains a number of TODOs so I guess this is still up to me to finish
> before I can think of doing an upload.

I think so too.

> Is the LICENSE.txt in the Arrow source root directory complete and lists
> _all_ third-party licenses and copyright holders in the release tarball?

No. Most of them are covered but some of them only exists in
source code such as
https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/mman.h .

I put many TODOs to debian/changelog because of this...


Thanks,
-- 
kou

In <a6720063-aa49-e7b1-8124-b1ec2d4a6...@debian.org>
  "Debian packaging for Arrow" on Fri, 11 Jun 2021 11:26:30 +0200,
  Sascha Steinbiss <sa...@debian.org> wrote:

> Hi Arrow community!
> 
> I am a Debian Developer looking to package Arrow officially in Debian as
> a dependency for a specific tool I want to get into Debian as well.
> 
> I do have a working package based on the JFrog packaging groundwork [0]
> but had to make various changes mostly to avoid downloading dependencies
> from the Internet (which is not allowed during the Debian build
> process). So, mostly setting -DARROW_DEPENDENCY_SOURCE=SYSTEM and tuning
> enabled/disabled features based on what we have and what we don't.
> Result is at [1].
> 
> It looks like I can build all packages built by the JFrog packaging with
> no problems (at least for amd64). Build log attached. The only exception
> here are ORC and S3 support, which are missing because the ORC library
> [2] and the AWS C++ SDK [3] are not packaged yet. But apart from that it
> looks like everything works.
> 
> Just so you know, nothing has been officially uploaded yet. The package
> is still in preparation and only used internally within my organization
> so far.
> 
> Being quite far in the packaging process, I have some questions:
> 
> 1.) Would somebody from the upstream team be interested in collaborating
> to keep Arrow maintained in Debian? I would be able to review updates
> and sponsor uploads.
> 
> 2.) One quite scary thing left is documenting all copyright and license
> occurrences in the codebase. It looks like there is a fair bit of
> embedded code coming from various sources and with varying levels of
> modification. The debian/copyright file in the JFrog packaging only
> contains a number of TODOs so I guess this is still up to me to finish
> before I can think of doing an upload.
> Is the LICENSE.txt in the Arrow source root directory complete and lists
> _all_ third-party licenses and copyright holders in the release tarball?
> If so, I could use it as a template and just reformat it as required by
> Debian? That would be nice to know, otherwise that would mean a lot of
> digging and probably still missing something. Missed license or
> copyright holder mentions are the most common reason why new packages
> are rejected during the initial, mandatory manual review for new
> packages, BTW, so I'd like to avoid unnecessary review iterations ;)
> 
> Thanks!
> 
> Best regards
> Sascha
> 
> [0]
> https://apache.jfrog.io/artifactory/arrow/debian/pool/bullseye/main/a/apache-arrow/apache-arrow_4.0.0-1.debian.tar.xz
> [1] https://salsa.debian.org/satta/arrow/-/tree/master/debian
> [2] https://github.com/apache/orc
> [3] https://github.com/aws/aws-sdk-cpp
> 

Reply via email to