module 'pyarrow.lib' has no attribute '_CRecordBatchReader'

2022-03-01 Thread Cindy McMullen
Hi - I'm trying to use DGL (Deep Graph Library) DGLDataset API with the RAPIDS cuda DataFrame API. Am getting this error: module 'pyarrow.lib' has no attribute '_CRecordBatchReader' Wonder if you see anything obvious in the stack trace that might help me debug? Here's the full stack

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Weston Pace
Does setting UseAsync on the C++ end make a difference? It's possible we switched the default to async in python in 6.0.0 but not in C++. On Tue, Mar 1, 2022, 11:35 Niranda Perera wrote: > Oh, I forgot to mention, had to fix LD_LIBRARY_PATH when running the c++ > executable. >

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Niranda Perera
@Jayeet, I ran your example in my desktop, and I don't see any timing issues there. I used conda to install pyarrow==6.0.0 I used the following command g++ -O3 -std=c++11 dataset_bench.cc -I"$CONDA_PREFIX"/include -L"$CONDA_PREFIX"/lib -larrow -larrow_dataset -lparquet -o dataset_bench And I had

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Jayjeet Chakraborty
Hi Sasha, Thanks a lot for replying. I tried -O2 earlier but it didn't work. I tried it again (when compiling with PyArrow SO files) and unfortunately, it didn't improve the results. On Tue, Mar 1, 2022 at 11:14 AM Sasha Krassovsky wrote: > Hi Jayjeet, > I noticed that you're not compiling

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Sasha Krassovsky
Hi Jayjeet, I noticed that you're not compiling dataset_bench with optimizations enabled. I'm not sure how much it will help, but it may be worth adding `-O2` to your g++ invocation. Sasha Krassovsky On Tue, Mar 1, 2022 at 11:11 AM Jayjeet Chakraborty < jayjeetchakrabort...@gmail.com> wrote: >

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Jayjeet Chakraborty
Hi Niranda, David, I ran my benchmarks again with the PyArrow .SO libraries which should be optimized. PyArrow version was 6.0.1 installed from pip. Here are my new results [1]. Numbers didn't quite seem to improve. You can check my build config in the Makefile [2]. I created a README [3] to make

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Niranda Perera
Hi Jayeet, Could you try building your cpp project against the arrow.so in pyarrow installation? It should be in the lib directory in your python environment. Best On Tue, Mar 1, 2022 at 12:46 PM Jayjeet Chakraborty < jayjeetchakrabort...@gmail.com> wrote: > Thanks for your reply, David. > >

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Jayjeet Chakraborty
Thanks for your reply, David. 1) I used PyArrow 6.0.1 for both C++ and Python. 2) The dataset was deployed using this [1] script. 3) For C++, Arrow was built from source in release mode. You can see the CMake config here [2]. I think I need to test once with Arrow C++ installed from packages

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread David Li
Hi Jayjeet, That's odd since the Python API is just wrapping the C++ API, so they should be identical if everything is configured the same. (So is the Java API, incidentally.) That's effectively what the SO question is saying. What versions of PyArrow and Arrow are you using? Just to check the