I see. I believe I was already building in release mode as I was not
passing the CMAKE_BUILD_TYPE flag (which means it will build in release by
default). I will crosscheck once more. Thanks again for all the help.
On Wed, Mar 2, 2022 at 9:35 AM Niranda Perera
wrote:
> I think you should try
I think you should try release build mode!
On Wed, Mar 2, 2022 at 12:21 PM Jayjeet Chakraborty <
jayjeetchakrabort...@gmail.com> wrote:
> Thanks for all the help everyone. I was able to follow Niranda's steps and
> get the same perf in both C++ and Python. But I still don't know which are
>
On Wed, 2 Mar 2022 09:20:50 -0800
Jayjeet Chakraborty wrote:
> Thanks for all the help everyone. I was able to follow Niranda's steps and
> get the same perf in both C++ and Python. But I still don't know which are
> essential optimizations for compiling Arrow in C++. Can anyone please share
>
Thanks for all the help everyone. I was able to follow Niranda's steps and
get the same perf in both C++ and Python. But I still don't know which are
essential optimizations for compiling Arrow in C++. Can anyone please share
some pointers on that ? I think documenting the essential C++
Does setting UseAsync on the C++ end make a difference? It's possible we
switched the default to async in python in 6.0.0 but not in C++.
On Tue, Mar 1, 2022, 11:35 Niranda Perera wrote:
> Oh, I forgot to mention, had to fix LD_LIBRARY_PATH when running the c++
> executable.
>
@Jayeet,
I ran your example in my desktop, and I don't see any timing issues there.
I used conda to install pyarrow==6.0.0
I used the following command
g++ -O3 -std=c++11 dataset_bench.cc -I"$CONDA_PREFIX"/include
-L"$CONDA_PREFIX"/lib -larrow -larrow_dataset -lparquet -o dataset_bench
And I had
Hi Sasha,
Thanks a lot for replying. I tried -O2 earlier but it didn't work. I tried
it again (when compiling with PyArrow SO files) and unfortunately, it
didn't improve the results.
On Tue, Mar 1, 2022 at 11:14 AM Sasha Krassovsky
wrote:
> Hi Jayjeet,
> I noticed that you're not compiling
Hi Jayjeet,
I noticed that you're not compiling dataset_bench with optimizations
enabled. I'm not sure how much it will help, but it may be worth adding
`-O2` to your g++ invocation.
Sasha Krassovsky
On Tue, Mar 1, 2022 at 11:11 AM Jayjeet Chakraborty <
jayjeetchakrabort...@gmail.com> wrote:
>
Hi Niranda, David,
I ran my benchmarks again with the PyArrow .SO libraries which should be
optimized. PyArrow version was 6.0.1 installed from pip. Here are my new
results [1]. Numbers didn't quite seem to improve. You can check my build
config in the Makefile [2]. I created a README [3] to make
Hi Jayeet,
Could you try building your cpp project against the arrow.so in pyarrow
installation? It should be in the lib directory in your python environment.
Best
On Tue, Mar 1, 2022 at 12:46 PM Jayjeet Chakraborty <
jayjeetchakrabort...@gmail.com> wrote:
> Thanks for your reply, David.
>
>
Thanks for your reply, David.
1) I used PyArrow 6.0.1 for both C++ and Python.
2) The dataset was deployed using this [1] script.
3) For C++, Arrow was built from source in release mode. You can see the
CMake config here [2].
I think I need to test once with Arrow C++ installed from packages
Hi Jayjeet,
That's odd since the Python API is just wrapping the C++ API, so they should be
identical if everything is configured the same. (So is the Java API,
incidentally.) That's effectively what the SO question is saying.
What versions of PyArrow and Arrow are you using? Just to check the
Hi Arrow community,
I was working on a class project for benchmarking Apache Arrow Dataset API
in different programming languages. I found out that for some reason the
C++ API example is slower than the Python API example. I ran my benchmarks
on a 5 GB dataset consisting of 300 16MB parquet
13 matches
Mail list logo