Re: Webassembly?

2023-07-06 Thread Tim Paine
I can help, we use emscripten-compiled 
 arrow for perspective ( https://github.com/finos/perspective) and we now 
compile perspective's python side for pyodide so I have an interest in a fully 
functional pyarrow/pandas in pyodide on an ongoing basis. 

Tim Paine
tim.paine.nyc
908-721-1185

> On Jul 6, 2023, at 11:14, Antoine Pitrou  wrote:
> 
> 
> Hi Joe,
> 
> Thank you for working on that.
> 
> The one question I have is: are you willing to help us maintain Arrow C++ on 
> the long term? The logic you're adding in 
> https://github.com/apache/arrow/pull/35672 is quite delicate; also I don't 
> think anyone among us is a Webassembly expert, which means that we might 
> break things unwillingly. So while it would be great to get Arrow C++ to work 
> with WASM, a dedicated expert is needed to help maintain and debug WASM 
> support in the future.
> 
> Regards
> 
> Antoine.
> 
> 
>> Le 03/07/2023 à 17:29, Joe Marshall a écrit :
>> Hi,
>> I'm a pyodide developer amongst other things (webassembly cpython 
>> intepreter) and I've got some PRs in progress on arrow relating to 
>> webassembly support. I wondered if it might be worth discussing my broader 
>> ideas for this on the list or at the biweekly development meeting?
>> So far I have 35176 in, which makes arrow run on a single thread. This is 
>> needed because in a lot of webassembly environments (browsers at least, 
>> pyodide), threading isn't available or is heavily constrained.
>> With that I've aimed to make it relatively transparent to users, so that 
>> things like datasets and acero mostly just work (but slower obviously). It's 
>> kind of fiddly in the arrow code but working, and means users can port 
>> things easily.
>> Once that is in, the plan is to submit a following pr that adds cmake 
>> presets for emscripten which can build the cpp libraries and pyarrow for 
>> pyodide. I've hacked this together in a build already, it's a bit fiddly and 
>> needs a load of tidying up, but I'm confident it can be done.
>> Essentially, I'm wanting to get this stuff in because pandas is moving 
>> towards arrow as a pretty much required dependency, and webassembly is a 
>> pandas platform, as well as
>>  being an official python platform, so it would be great to get it working 
>> in pyodide without us needing to maintain a load of patches. I guess it 
>> could also come in handy with various container platforms that are moving to 
>> webassembly.
>> Basically I thought it's probably worth a bit of a heads up relating to 
>> this, as I know the bigger picture of things is often hard to see from just 
>> pull requests.
>> Thanks
>> Joe
>> This message and any attachment are intended solely for the addressee
>> and may contain confidential information. If you have received this
>> message in error, please contact the sender and delete the email and
>> attachment.
>> Any views or opinions expressed by the author of this email do not
>> necessarily reflect the views of the University of Nottingham. Email
>> communications with the University of Nottingham may be monitored
>> where permitted by law.


Re: [VOTE] Release Apache Arrow JS 6.0.2

2021-12-10 Thread Tim Paine
Just wanted to note that our move from Arrow JS to Wasm arrow resulted in a 
pretty substantial speedup, although we solve none of the problems of exposing 
wasm arrow over JS since we use it directly from C++.

Initial PR: https://github.com/finos/perspective/pull/755 
<https://github.com/finos/perspective/pull/755>
Standalone Arrow C++ to wasm via emscripten: 
https://github.com/timkpaine/arrow-wasm-cpp 
<https://github.com/timkpaine/arrow-wasm-cpp>
(Note that neither utilize all of arrow, just a carve out of IPC stuff)




Tim Paine
tim.paine.nyc
908.721.1185

> On Dec 9, 2021, at 14:18, Dominik Moritz  wrote:
> 
>> 
>> ​Arrow rust implementation is in another repository and has support for
>> 
> Javascript/Webassembly :
>> 
>> https://github.com/apache/arrow-rs/tree/master/arrow
>> 
>> The release cadence for the Rust implementation is much higher than for
>> the  C++ implementation. Efficiencies might be gained by releasing Rust
>> and Javascript point implementations together since then the process of
>> creating and verifying signed software would minimize PMC workload.Arrow
>> rust implementation is in another repository and has support for
>> 
> Javascript/Webassembly :
>> 
>> https://github.com/apache/arrow-rs/tree/master/arrow
>> 
>> The release cadence for the Rust implementation is much higher than for
>> the  C++ implementation. Efficiencies might be gained by releasing Rust
>> and Javascript point implementations together since then the process of
>> creating and verifying signed software would minimize PMC workload.
> 
> 
> The biggest challenge with making a web-library for Arrow with WASM is a
> performant JS API. I think realistically, we will have a pure JS Arrow
> library for a few years. Do you think we could sync the release processes
> even if Arrow JS is not in the rust repo? If so, I would love to learn more
> about how that process would work.
> 
> On Dec 7, 2021 at 03:43:13, Benson Muite  wrote:
> 
>> At the moment, the release is not packaged or signed. Thus one can only
>> run the tests on the branch in the git repository. A script to do that
>> on Linux is available at:
>> 
>> https://github.com/bkmgit/arrow/blob/ARROW-14801/dev/release/verify-js.sh
>> 
>> My understanding is that only PMC members can sign, at the moment not
>> many seem to use Javascript extensively. Can create a script for
>> generating the Javascript only release source package based on the
>> current source packaging and release scripts, but a PMC member would
>> need to have this signed and uploaded.
>> 
>> @Dominik - was not aware of arrow-wasm, thanks.
>> 
>> Arrow rust implementation is in another repository and has support for
>> Javascript/Webassembly :
>> 
>> https://github.com/apache/arrow-rs/tree/master/arrow
>> 
>> The release cadence for the Rust implementation is much higher than for
>> the  C++ implementation. Efficiencies might be gained by releasing Rust
>> and Javascript point implementations together since then the process of
>> creating and verifying signed software would minimize PMC workload.
>> 
>> Benson
>> 
>> On 12/6/21 1:01 AM, Wes McKinney wrote:
>> 
>> hi Dominik — can you provide instructions for how we should verify the
>> 
>> release, aside from checking the GPG signature and checksums?
>> 
>> 
>> On Sun, Nov 28, 2021 at 12:41 PM Dominik Moritz 
>> wrote:
>> 
>>> 
>> 
>>> Are you talking about https://github.com/domoritz/arrow-wasm? It
>> definitely
>> 
>>> isn’t ready for prime time. The overhead of WASM, some issues with the
>> Rust
>> 
>>> implementation (some of which I think will be addressed with the Arrow2
>> 
>>> Rust migration), and the much larger bundle size make it not practical
>> 
>>> right now. As the WASM ecosystem matures, we can reevaluate and maybe
>> also
>> 
>>> consider moving only some of the processing in WASM and leave the rest in
>> 
>>> JS. I’m pretty excited about WASM and what it could bring to Arrow
>> 
>>> especially when combined with WebGPU.
>> 
>>> 
>> 
>>> Either way, I think we should release the 6.0.2 version soon. @PMC, could
>> 
>>> you vote on the patch release?
>> 
>>> 
>> 
>>> On Nov 28, 2021 at 04:33:41, Benson Muite 
>> 
>>> wrote:
>> 
>>> 
>> 
>>>> Rust implementation can be compiled to WebAssembly and is released
>> 
>>>> biweekly. The Javascript version 

Re: [DISCUSS] How to encode table_pivot information state in Arrow

2021-03-19 Thread Tim Paine
Perspective uses arrow across the wire but internally uses  it's own formats.

Tim Paine
tim.paine.nyc
908-721-1185

> On Mar 19, 2021, at 09:46, Michael Lavina  wrote:
> 
> Hey Benjamin,
> 
> That sounds really awesome. Thank you.
> 
> Sorry if this was already a well known thing as I am fairly new to the Arrow 
> ecosystem. Is there a way to track a roadmap for Arrow 4 and be involved in 
> that? Is there anywhere I can read more just general information on that?
> 
> -Michael
> 
> From: Benjamin Kietzman 
> Date: Friday, March 19, 2021 at 9:14 AM
> To: dev 
> Subject: Re: [DISCUSS] How to encode table_pivot information state in Arrow
> Hi Michael,
> 
> We are targeting grouped aggregation for 4.0 as part of a general query
> engine buildout. We also intend to bring DataFrame functionality into core
> Arrow (which would probably include an analog of pandas' pivot_table), but
> the query engine work is a prerequisite.
> 
> Ben Kietzman
> 
>> On Fri, Mar 19, 2021, 08:19 Michael Lavina 
>> wrote:
>> 
>> Hey Team,
>> 
>> Sorry if this is answered already somewhere I tried searching emails and
>> issues but couldn’t find anything. I am wondering if there is a standard
>> way to encode row or column pivots in Arrow?
>> 
>> I know Pandas does it already some way
>> https://urldefense.com/v3/__https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html__;!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8e_Ejvqp$<https://urldefense.com/v3/__https:/pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html__;!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8e_Ejvqp$>
>> and there are libraries using Arrow like Perspective that may have their
>> internal solution for representation of pivots
>> https://urldefense.com/v3/__https://perspective.finos.org/docs/md/view.html*row-pivots__;Iw!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8cXbLZNA$<https://urldefense.com/v3/__https:/perspective.finos.org/docs/md/view.html*row-pivots__;Iw!!PBKjc0U4!aRYySdE5nJFh6JBpP7YXqwFlAXpHj81USUsUKdIyHn_ryLYJxyKobsgdrfhI8cXbLZNA$>
>> 
>> I am wondering if there is already a discussion or already a best practice
>> or standard for encoding this information. Or alternatively is this not
>> something that should be at all associated with Arrow.
>> 
>> -Michael
>> 
>> P.S. If anyone on the Perspective team or anyone who might know is on this
>> thread I would be interested in understanding more how Perspective,
>> specifically, encodes pivot information in Arrow.
>> 
>> 


Re: Developing a C++ Python extension

2020-07-02 Thread Tim Paine
We build pyarrow in the docker image because auditwheel complains about pyarrow 
otherwise which causes our wheels to fail auditwheel and not allow the 
manylinux tag. But assuming we build pyarrow in the docker image, our manylinux 
wheels that result are then compatible with the pyarrow manylinux wheels.

It has taken a few months of on-and-off work, you may need to also consult this 
PR
https://github.com/finos/perspective/pull/1105/files 
<https://github.com/finos/perspective/pull/1105/files>


> On Jul 2, 2020, at 11:04, Uwe L. Korn  wrote:
> 
> Hello Tim,
> 
> thanks for the hint. I see that you build arrow by yourselves in the 
> Dockerfile. Could it be that in the end you statically link the arrow 
> libraries?
> 
> As there are no wheel on PyPI, I couldn't verify whether that assumption is 
> true.
> 
> Best
> Uwe
> 
> On Thu, Jul 2, 2020, at 4:53 PM, Tim Paine wrote:
>> We spent a ton of time on this for perspective, the end result is a 
>> mostly compatible set of wheels for most platforms, I believe we 
>> skipped py2 but nobody cares about those anyway. We link against 
>> libarrow and libarrow_python on Linux, on windows we vendor them all 
>> into our library. Feel free to scrape the perspective repo's cmake 
>> lists and setup.py for details.
>> 
>> Tim Paine
>> tim.paine.nyc
>> 
>>> On Jul 2, 2020, at 10:32, Uwe L. Korn  wrote:
>>> 
>>> I had so much fun with the wheels in the past, I'm now a happy member of 
>>> conda-forge core instead :D
>>> 
>>> The good thing first:
>>> 
>>> * The C++ ABI didn't change between the manylinux versions, it is the old 
>>> one in all cases. So you mix & match manylinux versions.
>>> 
>>> The sad things:
>>> 
>>> * The manylinuxX standard are intented to provide a way to ship 
>>> *self-contained* wheels that run on any recent Linux. The important part 
>>> here is that they need to be self-contained. Having a binary dependency on 
>>> another wheel is actually not allowed.
>>> * Thus the snowflake-python-connector ships the libarrow.so it was build 
>>> with as part of its wheel. In this case auditwheel is happy with the wheel.
>>> * It is working with numpy as a dependency because NumPy linkage is similar 
>>> to the import lib behaviour on Windows: You don't actually link against 
>>> numpy but you statically link a set of functions that are resolved to 
>>> NumPy's function when you import numpy. Quick googling leads to 
>>> https://github.com/yugr/Implib.so which could provide something similar for 
>>> Linux.
>>> * You could actually omit linking to libarrow and try to populate the 
>>> symbols before you load the library. This is how the Python symbols are 
>>> available to extensions without linking to libpython.
>>> 
>>> 
>>>> On Thu, Jul 2, 2020, at 2:43 PM, Maarten Breddels wrote:
>>>> Ok, thanks!
>>>> 
>>>> I'm setting up a repo with an example here, using pybind11:
>>>> https://github.com/vaexio/vaex-arrow-ext
>>>> 
>>>> and I'll just try all possible combinations and report back.
>>>> 
>>>> cheers,
>>>> 
>>>> Maarten Breddels
>>>> Software engineer / consultant / data scientist
>>>> Python / C++ / Javascript / Jupyter
>>>> www.maartenbreddels.com / vaex.io
>>>> maartenbredd...@gmail.com +31 6 2464 0838 <+31+6+24640838>
>>>> [image: Twitter] <https://twitter.com/maartenbreddels>[image: Github]
>>>> <https://github.com/maartenbreddels>[image: LinkedIn]
>>>> <https://linkedin.com/in/maartenbreddels>[image: Skype]
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Op do 2 jul. 2020 om 14:32 schreef Joris Van den Bossche <
>>>> jorisvandenboss...@gmail.com>:
>>>> 
>>>>> Also no concrete answer, but one such example is turbodbc, I think.
>>>>> But it seems they only have conda binary packages, and don't
>>>>> distribute wheels ..
>>>>> (https://turbodbc.readthedocs.io/en/latest/pages/getting_started.html),
>>>>> so not that relevant as comparison (they also need to build against an
>>>>> odbc driver in addition to arrow).
>>>>> But maybe Uwe has some more experience in this regard (and with
>>>>> attempts building wheels for turbodbc, eg
>>>>> https://github.com/blue-yonder/turbodbc/pull/108).
>>>>> 
>>>>> Joris

Re: Performance of ArrowJS in the DOM

2020-07-02 Thread Tim Paine
The virtual table a sounds a lot like regular-table:
https://github.com/jpmorganchase/regular-table

Used in perspective:
https://perspective.finos.org/

We use arrow c++ compiled with webassembly and some front end grid and chart 
plugins, perspective can run in a client server fashion and only sends diffs 
across the wire so works well for random access e.g. in pivoted views.


Tim Paine
tim.paine.nyc


> On Jul 2, 2020, at 09:45, Matthias Vallentin  wrote:
> 
> Hi folks,
> 
> We are reaching out to better understand the performance of ArrowJS when it
> comes to viewing large amounts of data (> 1M records) in the browser’s DOM.
> Our backend (https://github.com/tenzir/vast) spits out record batches,
> which we are accumulating in the frontend with a RecordBatchReader.
> 
> At first, we only want to render the data fast, line by line, with minimal
> markup according to its types from the schema. We use a virtual scrolling
> window to avoid overloading the DOM, that is, we lazily convert the record
> batch data to DOM elements according to a scroll window defined by the
> user. As the user scrolls, elements outside the window get removed and new
> ones added.
> 
> The data consists of one or more Tables that we are pulling in through the
> RecordBatchReader. We use the Async Interator interface to go over the
> record batches and convert them into rows. This API feels suboptimal for
> our use cases, where we want random access to the data. Is there a
> faster/better way to do this?
> 
> Does anyone have any experience worth sharing with doing something similar?
> The DOM is the main bottleneck, but if there are some clever things we can
> do with Arrow to pull out the data in the most efficient way, that would be
> nice.
> 
>Matthias


Re: Developing a C++ Python extension

2020-07-02 Thread Tim Paine
We spent a ton of time on this for perspective, the end result is a mostly 
compatible set of wheels for most platforms, I believe we skipped py2 but 
nobody cares about those anyway. We link against libarrow and libarrow_python 
on Linux, on windows we vendor them all into our library. Feel free to scrape 
the perspective repo's cmake lists and setup.py for details.

Tim Paine
tim.paine.nyc

> On Jul 2, 2020, at 10:32, Uwe L. Korn  wrote:
> 
> I had so much fun with the wheels in the past, I'm now a happy member of 
> conda-forge core instead :D
> 
> The good thing first:
> 
> * The C++ ABI didn't change between the manylinux versions, it is the old one 
> in all cases. So you mix & match manylinux versions.
> 
> The sad things:
> 
> * The manylinuxX standard are intented to provide a way to ship 
> *self-contained* wheels that run on any recent Linux. The important part here 
> is that they need to be self-contained. Having a binary dependency on another 
> wheel is actually not allowed.
> * Thus the snowflake-python-connector ships the libarrow.so it was build with 
> as part of its wheel. In this case auditwheel is happy with the wheel.
> * It is working with numpy as a dependency because NumPy linkage is similar 
> to the import lib behaviour on Windows: You don't actually link against numpy 
> but you statically link a set of functions that are resolved to NumPy's 
> function when you import numpy. Quick googling leads to 
> https://github.com/yugr/Implib.so which could provide something similar for 
> Linux.
> * You could actually omit linking to libarrow and try to populate the symbols 
> before you load the library. This is how the Python symbols are available to 
> extensions without linking to libpython.
> 
> 
>> On Thu, Jul 2, 2020, at 2:43 PM, Maarten Breddels wrote:
>> Ok, thanks!
>> 
>> I'm setting up a repo with an example here, using pybind11:
>> https://github.com/vaexio/vaex-arrow-ext
>> 
>> and I'll just try all possible combinations and report back.
>> 
>> cheers,
>> 
>> Maarten Breddels
>> Software engineer / consultant / data scientist
>> Python / C++ / Javascript / Jupyter
>> www.maartenbreddels.com / vaex.io
>> maartenbredd...@gmail.com +31 6 2464 0838 <+31+6+24640838>
>> [image: Twitter] <https://twitter.com/maartenbreddels>[image: Github]
>> <https://github.com/maartenbreddels>[image: LinkedIn]
>> <https://linkedin.com/in/maartenbreddels>[image: Skype]
>> 
>> 
>> 
>> 
>> Op do 2 jul. 2020 om 14:32 schreef Joris Van den Bossche <
>> jorisvandenboss...@gmail.com>:
>> 
>>> Also no concrete answer, but one such example is turbodbc, I think.
>>> But it seems they only have conda binary packages, and don't
>>> distribute wheels ..
>>> (https://turbodbc.readthedocs.io/en/latest/pages/getting_started.html),
>>> so not that relevant as comparison (they also need to build against an
>>> odbc driver in addition to arrow).
>>> But maybe Uwe has some more experience in this regard (and with
>>> attempts building wheels for turbodbc, eg
>>> https://github.com/blue-yonder/turbodbc/pull/108).
>>> 
>>> Joris
>>> 
>>> On Thu, 2 Jul 2020 at 11:05, Antoine Pitrou  wrote:
>>>> 
>>>> 
>>>> Hi Maarten,
>>>> 
>>>> Le 02/07/2020 à 10:53, Maarten Breddels a écrit :
>>>>> 
>>>>> Also, I see pyarrow distributes manylinux1/2010/2014 wheels. Would a
>>> vaex
>>>>> extension distributed as a 2010 wheel, and build with the pyarrow 2010
>>>>> wheel, work in an environment where someone installed a pyarrow 2014
>>>>> wheel, or build from source, or installed from conda-forge?
>>>> 
>>>> I have no idea about the concrete answer, but it probably depends
>>>> whether the libstdc++ ABI changed between those two versions.  I'm
>>>> afraid you'll have to experiment yourself.
>>>> 
>>>> (if you want to eschew C++ ABI issues, you may use the C Data Interface:
>>>> https://arrow.apache.org/docs/format/CDataInterface.html
>>>> though of course you won't have access to all the useful helpers in the
>>>> Arrow C++ library)
>>>> 
>>>> Regards
>>>> 
>>>> Antoine.
>>>> 
>>>> 
>>> 
>> 


Re: PyArrow Size in 0.15 version

2019-10-22 Thread Tim Paine
Arrow has lots of configuration arguments, and PyArrow allows you to build 
certain subsets of Arrow’s functionality. Depending on what you need, you can 
probably start out by building from source and turning off Parquet, Plasma, and 
Gandiva support. 

When running cmake, use -DARROW_PARQUET=OFF -DARROW_PLASMA=OFF and 
-DARROW_GANDIVA=OFF.

You can also try trimming the tests, and making sure not to include extra 
shared libraries. 

> On Oct 22, 2019, at 5:56 PM, Kiran Padmanabhui  wrote:
> 
> Hi,
> 
> I am trying to use PyArrow in AWS Lambda but the size is 190 MB which is
> too big for AWS Lambda.
> 
> Is there a way I can compile it and reduce it to less than 20-30 MB?
> 
> Thanks
> Kiran



Re: [C++] The quest for zero-dependency builds

2019-10-10 Thread Tim Paine
FWIW for perspective, we ended up just using our own Cmake file to build arrow, 
we needed a minimal subset of functionality on a tight size budget and it was 
easier doing that than configuring all the flags.

https://github.com/finos/perspective/blob/master/cmake/arrow/CMakeLists.txt



Tim Paine
tim.paine.nyc
908-721-1185

> On Oct 10, 2019, at 06:02, Antoine Pitrou  wrote:
> 
> 
> Hi all,
> 
> I'm a bit concerned that we're planning to add many additional build
> options in the quest to have a core zero-dependency build in C++.
> See for example https://issues.apache.org/jira/browse/ARROW-6633 or
> https://issues.apache.org/jira/browse/ARROW-6612.
> 
> The problem is that this is creating many possible configurations and we
> will only be testing a tiny subset of them.  Inevitably, users will try
> other option combinations and they'll fail building for some random
> reason.  It will not be a very good user experience.
> 
> Another related issue is user perception when doing a default build.
> For example https://issues.apache.org/jira/browse/ARROW-6638 proposes to
> build with jemalloc disabled by default.  Inevitably, people will be
> doing benchmarks with this (publicly or not) and they'll conclude Arrow
> is not as performant as it claims to be.
> 
> Perhaps we should look for another approach instead?
> 
> For example we could have a single ARROW_BARE_CORE (whatever the name)
> option that when enabled (not by default) builds the tiniest minimal
> subset of Arrow.  It's more inflexible, but at least it's something that
> we can reasonably test.
> 
> Regards
> 
> Antoine.


Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Tim Paine
Looking at the code it looks simple to add, I will look into it this week and 
do a PR if I get something useable.

Tim Paine
tim.paine.nyc
908-721-1185

> On Sep 10, 2019, at 19:35, Wes McKinney  wrote:
> 
> Hi Tim,
> 
> I see what you're saying now, sorry that I didn't understand sooner.
> 
> We actually need this feature to be able to pass instances of
> shared_ptr (under very controlled conditions) into R using
> reticulate, where T is any of
> 
> * Array
> * ChunkedArray
> * DataType
> * RecordBatch
> * Table
> * and some other classes
> 
> 
> I would suggest introducing a property on pyarrow Python objects that
> returns the memory address of the wrapped shared_ptr (i.e. the
> integer leading to shared_ptr*). Then you can create your copy of
> that. Would that work? The only reason this is not implemented is that
> no one has needed it yet, mechanically it does not strike me as that
> complex.
> 
> See https://issues.apache.org/jira/browse/ARROW-3750. My comment in
> November 2018 "Methods would need to be added to the Cython extension
> types to give the memory address of the smart pointer object they
> contain". I agree with my younger self. Are you up to submit a PR?
> 
> - Wes
> 
>> On Tue, Sep 10, 2019 at 6:31 PM Tim Paine  wrote:
>> 
>> The end goal is to go direct from pyarrow to wasm without intermediate 
>> transforms. I can definitely make it work as is, we'll just have to be 
>> careful that the code we compile to webassembly matches exactly either our 
>> local copy of arrow if the user hasn't installed pyarrow, otherwise their 
>> installed copy.
>> 
>> Tim Paine
>> tim.paine.nyc
>> 908-721-1185
>> 
>>> On Sep 10, 2019, at 19:12, Tim Paine  wrote:
>>> 
>>> We're building webassembly, so we obviously don't want to introduce a 
>>> pyarrow dependency. I don't want to do any pyarrow manipulations in c++, 
>>> just get the c++ table. I was hoping pyarrow might expose a raw pointer or 
>>> have something castable.
>>> 
>>> It seems to be a big limitation, there is no way of communicating a pyarrow 
>>> table to a c++ library that uses arrow without that library linking against 
>>> pyarrow.
>>> 
>>> Tim Paine
>>> tim.paine.nyc
>>> 908-721-1185
>>> 
>>>> On Sep 10, 2019, at 17:44, Wes McKinney  wrote:
>>>> 
>>>> The Python extension types are defined in Cython, not C or C++ so you need
>>>> to load the Cython extensions in order to instantiate the classes.
>>>> 
>>>> Why do you have 2 copies of the C++ library? That seems easy to fix. If you
>>>> are using wheels from PyPI I would recommend that you switch to conda or
>>>> your own wheels without the C++ libraries bundled.
>>>> 
>>>>> On Tue, Sep 10, 2019, 4:23 PM Tim Paine  wrote:
>>>>> 
>>>>> Is there no way to do it without PyArrow? My C++ library is building arrow
>>>>> itself, which means if I use PyArow I’ll end up having 2 copies: one from
>>>>> my local C++ only build, and one from PyArrow.
>>>>> 
>>>>>> On Sep 10, 2019, at 5:18 PM, Wes McKinney  wrote:
>>>>>> 
>>>>>> hi Tim,
>>>>>> 
>>>>>> You can use the functions in
>>>>>> 
>>>>>> 
>>>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h
>>>>>> 
>>>>>> You need to call "import_pyarrow()" from C++ before these APIs can be
>>>>>> used. It's similar to the NumPy C API in that regard
>>>>>> 
>>>>>> - Wes
>>>>>> 
>>>>>>> On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
>>>>>>> 
>>>>>>> Hey all, following up on a question I asked on stack overflow <
>>>>> https://stackoverflow.com/questions/57863751/how-to-convert-pyarrow-table-to-arrow-table-when-interfacing-between-pyarrow-in
>>>>>> .
>>>>>>> 
>>>>>>> It seems there is some code <
>>>>> https://arrow.apache.org/docs/python/extending.html#_CPPv412unwrap_tableP8PyObjectPNSt10shared_ptrI5TableEE>
>>>>> in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The
>>>>> problem with this is that my C++ library <
>>>>> https://github.com/finos/perspective> is going to build and link against
>>>>> Arrow on t

Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Tim Paine
We're building webassembly, so we obviously don't want to introduce a pyarrow 
dependency. I don't want to do any pyarrow manipulations in c++, just get the 
c++ table. I was hoping pyarrow might expose a raw pointer or have something 
castable.

It seems to be a big limitation, there is no way of communicating a pyarrow 
table to a c++ library that uses arrow without that library linking against 
pyarrow.

Tim Paine
tim.paine.nyc
908-721-1185

> On Sep 10, 2019, at 17:44, Wes McKinney  wrote:
> 
> The Python extension types are defined in Cython, not C or C++ so you need
> to load the Cython extensions in order to instantiate the classes.
> 
> Why do you have 2 copies of the C++ library? That seems easy to fix. If you
> are using wheels from PyPI I would recommend that you switch to conda or
> your own wheels without the C++ libraries bundled.
> 
>> On Tue, Sep 10, 2019, 4:23 PM Tim Paine  wrote:
>> 
>> Is there no way to do it without PyArrow? My C++ library is building arrow
>> itself, which means if I use PyArow I’ll end up having 2 copies: one from
>> my local C++ only build, and one from PyArrow.
>> 
>>> On Sep 10, 2019, at 5:18 PM, Wes McKinney  wrote:
>>> 
>>> hi Tim,
>>> 
>>> You can use the functions in
>>> 
>>> 
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h
>>> 
>>> You need to call "import_pyarrow()" from C++ before these APIs can be
>>> used. It's similar to the NumPy C API in that regard
>>> 
>>> - Wes
>>> 
>>>> On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
>>>> 
>>>> Hey all, following up on a question I asked on stack overflow <
>> https://stackoverflow.com/questions/57863751/how-to-convert-pyarrow-table-to-arrow-table-when-interfacing-between-pyarrow-in
>>> .
>>>> 
>>>> It seems there is some code <
>> https://arrow.apache.org/docs/python/extending.html#_CPPv412unwrap_tableP8PyObjectPNSt10shared_ptrI5TableEE>
>> in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The
>> problem with this is that my C++ library <
>> https://github.com/finos/perspective> is going to build and link against
>> Arrow on the C++ side rather than PyArrow side (because it will also be
>> consumed in WebAssembly), so I want to avoid also linking against PyArrow’s
>> copy of the arrow library. I also need to look for PyArrow’s header files,
>> which might conflict with the version in the local C++ code.
>>>> 
>>>> My solution right now is to just assert that PyArrow version == Arrow
>> version and do some pruning (so I link against local libarrow and PyArrow’s
>> libarrow_python rather than use PyArrow’s libarrow), but ideally it would
>> be great if there was a clean way to hand a PyArrow Table over to C++
>> without requiring the C++ to have PyArrow (e.g. using only a PyObject *).
>> Please forgive my ignorance/google skills if its already possible!
>>>> 
>>>> unwrap_table code:
>>>> 
>> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
>> <
>> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
>>> 
>>>> 
>>>> library pruning:
>>>> 
>> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
>> <
>> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Tim
>> 
>> 


Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Tim Paine
Is there no way to do it without PyArrow? My C++ library is building arrow 
itself, which means if I use PyArow I’ll end up having 2 copies: one from my 
local C++ only build, and one from PyArrow.

> On Sep 10, 2019, at 5:18 PM, Wes McKinney  wrote:
> 
> hi Tim,
> 
> You can use the functions in
> 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h
> 
> You need to call "import_pyarrow()" from C++ before these APIs can be
> used. It's similar to the NumPy C API in that regard
> 
> - Wes
> 
> On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
>> 
>> Hey all, following up on a question I asked on stack overflow 
>> <https://stackoverflow.com/questions/57863751/how-to-convert-pyarrow-table-to-arrow-table-when-interfacing-between-pyarrow-in>.
>> 
>> It seems there is some code 
>> <https://arrow.apache.org/docs/python/extending.html#_CPPv412unwrap_tableP8PyObjectPNSt10shared_ptrI5TableEE>
>>  in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The 
>> problem with this is that my C++ library 
>> <https://github.com/finos/perspective> is going to build and link against 
>> Arrow on the C++ side rather than PyArrow side (because it will also be 
>> consumed in WebAssembly), so I want to avoid also linking against PyArrow’s 
>> copy of the arrow library. I also need to look for PyArrow’s header files, 
>> which might conflict with the version in the local C++ code.
>> 
>> My solution right now is to just assert that PyArrow version == Arrow 
>> version and do some pruning (so I link against local libarrow and PyArrow’s 
>> libarrow_python rather than use PyArrow’s libarrow), but ideally it would be 
>> great if there was a clean way to hand a PyArrow Table over to C++ without 
>> requiring the C++ to have PyArrow (e.g. using only a PyObject *). Please 
>> forgive my ignorance/google skills if its already possible!
>> 
>> unwrap_table code:
>> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
>>  
>> <https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310>
>> 
>> library pruning:
>> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
>>  
>> <https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53>
>> 
>> 
>> 
>> 
>> Tim



Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Tim Paine
Hey all, following up on a question I asked on stack overflow 
.

It seems there is some code 

 in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The 
problem with this is that my C++ library  
is going to build and link against Arrow on the C++ side rather than PyArrow 
side (because it will also be consumed in WebAssembly), so I want to avoid also 
linking against PyArrow’s copy of the arrow library. I also need to look for 
PyArrow’s header files, which might conflict with the version in the local C++ 
code.

My solution right now is to just assert that PyArrow version == Arrow version 
and do some pruning (so I link against local libarrow and PyArrow’s 
libarrow_python rather than use PyArrow’s libarrow), but ideally it would be 
great if there was a clean way to hand a PyArrow Table over to C++ without 
requiring the C++ to have PyArrow (e.g. using only a PyObject *). Please 
forgive my ignorance/google skills if its already possible! 

unwrap_table code:
https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
 


library pruning:
https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
 





Tim