Hi -
I'm trying to use DGL (Deep Graph Library) DGLDataset API with the RAPIDS
cuda DataFrame API. Am getting this error:
module 'pyarrow.lib' has no attribute '_CRecordBatchReader'
Wonder if you see anything obvious in the stack trace that might help me debug?
Here's the full stack
Does setting UseAsync on the C++ end make a difference? It's possible we
switched the default to async in python in 6.0.0 but not in C++.
On Tue, Mar 1, 2022, 11:35 Niranda Perera wrote:
> Oh, I forgot to mention, had to fix LD_LIBRARY_PATH when running the c++
> executable.
>
@Jayeet,
I ran your example in my desktop, and I don't see any timing issues there.
I used conda to install pyarrow==6.0.0
I used the following command
g++ -O3 -std=c++11 dataset_bench.cc -I"$CONDA_PREFIX"/include
-L"$CONDA_PREFIX"/lib -larrow -larrow_dataset -lparquet -o dataset_bench
And I had
Hi Sasha,
Thanks a lot for replying. I tried -O2 earlier but it didn't work. I tried
it again (when compiling with PyArrow SO files) and unfortunately, it
didn't improve the results.
On Tue, Mar 1, 2022 at 11:14 AM Sasha Krassovsky
wrote:
> Hi Jayjeet,
> I noticed that you're not compiling
Hi Jayjeet,
I noticed that you're not compiling dataset_bench with optimizations
enabled. I'm not sure how much it will help, but it may be worth adding
`-O2` to your g++ invocation.
Sasha Krassovsky
On Tue, Mar 1, 2022 at 11:11 AM Jayjeet Chakraborty <
jayjeetchakrabort...@gmail.com> wrote:
>
Hi Niranda, David,
I ran my benchmarks again with the PyArrow .SO libraries which should be
optimized. PyArrow version was 6.0.1 installed from pip. Here are my new
results [1]. Numbers didn't quite seem to improve. You can check my build
config in the Makefile [2]. I created a README [3] to make
Hi Jayeet,
Could you try building your cpp project against the arrow.so in pyarrow
installation? It should be in the lib directory in your python environment.
Best
On Tue, Mar 1, 2022 at 12:46 PM Jayjeet Chakraborty <
jayjeetchakrabort...@gmail.com> wrote:
> Thanks for your reply, David.
>
>
Thanks for your reply, David.
1) I used PyArrow 6.0.1 for both C++ and Python.
2) The dataset was deployed using this [1] script.
3) For C++, Arrow was built from source in release mode. You can see the
CMake config here [2].
I think I need to test once with Arrow C++ installed from packages
Hi Jayjeet,
That's odd since the Python API is just wrapping the C++ API, so they should be
identical if everything is configured the same. (So is the Java API,
incidentally.) That's effectively what the SO question is saying.
What versions of PyArrow and Arrow are you using? Just to check the