Re: C++ version of Arrow slower than Python version

2022-03-04 Thread Jayjeet Chakraborty
I see. I believe I was already building in release mode as I was not passing the CMAKE_BUILD_TYPE flag (which means it will build in release by default). I will crosscheck once more. Thanks again for all the help. On Wed, Mar 2, 2022 at 9:35 AM Niranda Perera wrote: > I think you should try

Re: C++ version of Arrow slower than Python version

2022-03-02 Thread Niranda Perera
I think you should try release build mode! On Wed, Mar 2, 2022 at 12:21 PM Jayjeet Chakraborty < jayjeetchakrabort...@gmail.com> wrote: > Thanks for all the help everyone. I was able to follow Niranda's steps and > get the same perf in both C++ and Python. But I still don't know which are >

Re: C++ version of Arrow slower than Python version

2022-03-02 Thread Antoine Pitrou
On Wed, 2 Mar 2022 09:20:50 -0800 Jayjeet Chakraborty wrote: > Thanks for all the help everyone. I was able to follow Niranda's steps and > get the same perf in both C++ and Python. But I still don't know which are > essential optimizations for compiling Arrow in C++. Can anyone please share >

Re: C++ version of Arrow slower than Python version

2022-03-02 Thread Jayjeet Chakraborty
Thanks for all the help everyone. I was able to follow Niranda's steps and get the same perf in both C++ and Python. But I still don't know which are essential optimizations for compiling Arrow in C++. Can anyone please share some pointers on that ? I think documenting the essential C++

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Weston Pace
Does setting UseAsync on the C++ end make a difference? It's possible we switched the default to async in python in 6.0.0 but not in C++. On Tue, Mar 1, 2022, 11:35 Niranda Perera wrote: > Oh, I forgot to mention, had to fix LD_LIBRARY_PATH when running the c++ > executable. >

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Niranda Perera
@Jayeet, I ran your example in my desktop, and I don't see any timing issues there. I used conda to install pyarrow==6.0.0 I used the following command g++ -O3 -std=c++11 dataset_bench.cc -I"$CONDA_PREFIX"/include -L"$CONDA_PREFIX"/lib -larrow -larrow_dataset -lparquet -o dataset_bench And I had

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Jayjeet Chakraborty
Hi Sasha, Thanks a lot for replying. I tried -O2 earlier but it didn't work. I tried it again (when compiling with PyArrow SO files) and unfortunately, it didn't improve the results. On Tue, Mar 1, 2022 at 11:14 AM Sasha Krassovsky wrote: > Hi Jayjeet, > I noticed that you're not compiling

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Sasha Krassovsky
Hi Jayjeet, I noticed that you're not compiling dataset_bench with optimizations enabled. I'm not sure how much it will help, but it may be worth adding `-O2` to your g++ invocation. Sasha Krassovsky On Tue, Mar 1, 2022 at 11:11 AM Jayjeet Chakraborty < jayjeetchakrabort...@gmail.com> wrote: >

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Jayjeet Chakraborty
Hi Niranda, David, I ran my benchmarks again with the PyArrow .SO libraries which should be optimized. PyArrow version was 6.0.1 installed from pip. Here are my new results [1]. Numbers didn't quite seem to improve. You can check my build config in the Makefile [2]. I created a README [3] to make

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Niranda Perera
Hi Jayeet, Could you try building your cpp project against the arrow.so in pyarrow installation? It should be in the lib directory in your python environment. Best On Tue, Mar 1, 2022 at 12:46 PM Jayjeet Chakraborty < jayjeetchakrabort...@gmail.com> wrote: > Thanks for your reply, David. > >

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread Jayjeet Chakraborty
Thanks for your reply, David. 1) I used PyArrow 6.0.1 for both C++ and Python. 2) The dataset was deployed using this [1] script. 3) For C++, Arrow was built from source in release mode. You can see the CMake config here [2]. I think I need to test once with Arrow C++ installed from packages

Re: C++ version of Arrow slower than Python version

2022-03-01 Thread David Li
Hi Jayjeet, That's odd since the Python API is just wrapping the C++ API, so they should be identical if everything is configured the same. (So is the Java API, incidentally.) That's effectively what the SO question is saying. What versions of PyArrow and Arrow are you using? Just to check the

C++ version of Arrow slower than Python version

2022-02-28 Thread Jayjeet Chakraborty
Hi Arrow community, I was working on a class project for benchmarking Apache Arrow Dataset API in different programming languages. I found out that for some reason the C++ API example is slower than the Python API example. I ran my benchmarks on a 5 GB dataset consisting of 300 16MB parquet