O 25/03/24 ás 19:17, Julian Gilbey escribiu:
Hi all,

Hi :)

[NB: sent to d-science, d-python, d-devel and the RFP bug; reply-to
set to d-science and the RFP bug only]

An update on Apache Arrow, and in particular the Python library
PyArrow.  For those who don't know:

   Apache Arrow is a development platform for in-memory analytics. It
   contains a set of technologies that enable big data systems to
   process and move data fast. It specifies a standardized
   language-independent columnar memory format for flat and
   hierarchical data, organized for efficient analytic operations on
   modern hardware.

   The project is developing a multi-language collection of libraries
   for solving systems problems related to in-memory analytical data
   processing. This includes such topics as:

   * Zero-copy shared memory and RPC-based data movement

   * Reading and writing file formats (like CSV, Apache ORC, and Apache
     Parquet)

   * In-memory analytics and query processing

   (from: https://arrow.apache.org/docs/index.html)

Pandas has announced that Pandas 3.x will depend on PyArrow
in a critical way (it will back the "string" datatype), and it is due
to be released imminently.

So this is a plea for anyone looking for something really helpful to
do: it would be great to have a group of developers finally package
this!  There was some initial work done (see the RFP bug report for
details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021),
but that is fairly old now.  As Apache Arrow supports numerous
languages, it may well benefit from having a group of developers with
different areas of expertise to build it.  (Or perhaps it would make
more sense to split the upstream source into a collection of different
Debian source packages for the different supported languages.  I don't
know.)  Unfortunately I don't have the capacity to devote any time to
it myself.

Thanks in advance for anyone who can step forward for this!

Best wishes,

    Julian


If possible, I would like to contribute. At work we use the Go and Python implementations, also, in the short term, we will start using the Rust one.

Just to point out, the Rust version has its own native implementation, here: https://github.com/apache/arrow-rs .

Cheers,

Jose

--
José Manuel Abuín Mosquera
PhD. | Scientific Software Developer | Researcher

http://jmabuin.github.io

Reply via email to