Re: Hello to the Arrow dev community

2020-09-22 Thread Micah Kornfield
Welcome to the community Bob. On Tue, Sep 22, 2020 at 12:27 PM Bob Tinsman wrote: > I'd like to introduce myself, because I've had an interest in Arrow for a > long time and now I have a chance to help out.Up until now, I haven't > really contributed much in open source, although I've been an

Re: How to run Java benchmark?

2020-09-22 Thread Fan Liya
Hi Kazuaki, It seems the reason is that we have missed exec-maven-plugin in the pom.xml. We did not include it, because it would run all the benchmarks during maven build, which is extremely time consuming. I have opened ARROW-10069 to track this issue. Hopefully, I will provide a PR soon.

Re: [DISCUSS] Rethinking our approach to scheduling CPU and IO work in C++?

2020-09-22 Thread Wes McKinney
Thanks for the pointer to CAF. It reminds me a bit of libprocess which is a part of Apache Mesos, which also provides the actor model https://github.com/apache/mesos/tree/master/3rdparty/libprocess We'll have to determine a solution that is compatible with our spectrum of compiler toolchain

RE: PyArrow: Incrementally using ParquetWriter without keeping entire dataset in memory (large than memory parquet files)

2020-09-22 Thread Lee, David
Try writing smaller chunks.. I usually try to size up my parquet files to 128 megs to match our Hadoop filesystem block size. Within that 128 meg parquet files I usually have around 6 to 10 rowgroups which is basically 6 to 10 mini parquet files which are 12 to 20 megs each. Parquet files

Hello to the Arrow dev community

2020-09-22 Thread Bob Tinsman
I'd like to introduce myself, because I've had an interest in Arrow for a long time and now I have a chance to help out.Up until now, I haven't really contributed much in open source, although I've been an avid consumer, so I'd like to change that! My main areas of work have been performance

Re: [DISCUSS] Rethinking our approach to scheduling CPU and IO work in C++?

2020-09-22 Thread Matthias Vallentin
We are building a highly concurrent database for security data with Arrow as data plane (VAST ), so I thought I'll share our view on this since we went over pretty much all of the above mentioned questions. I'm not trying to say "you should do it this way" but

How to run Java benchmark?

2020-09-22 Thread Kazuaki Ishizaki
Dear all, I have one question on how to run Java benchmark. I built jar files by executing the following command " on an x86_64 machine. Then, based on [1], I tried to execute a Java benchmark program. However, I got an exception. I can find the BenchmarkList file, but it is not included in

Re: [DISCUSS] Memory alignment in rust - what to do?

2020-09-22 Thread Jörn Horstmann
Not a Rust expert yet, but here are my 2 (or more) cents: The alignment was chosen (as far as I know) so that separately allocated buffers would not share the same cache line, which could cause performance issues when the same cache line is accessed by multiple threads. So aligning our buffers

Re: [DISCUSS] Memory alignment in rust - what to do?

2020-09-22 Thread Antoine Pitrou
Le 22/09/2020 à 19:16, Jorge Cardoso Leitão a écrit : > Hi, > > I had some time to look at https://issues.apache.org/jira/browse/ARROW-10039, > wrt to the alignment requirements that rust implementation currently > imposes. > > The gist is that it is not that easy, and I would like to request

[DISCUSS] Memory alignment in rust - what to do?

2020-09-22 Thread Jorge Cardoso Leitão
Hi, I had some time to look at https://issues.apache.org/jira/browse/ARROW-10039, wrt to the alignment requirements that rust implementation currently imposes. The gist is that it is not that easy, and I would like to request some guidance. Some facts: 1. Our current implementation does not

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-09-22 Thread Antoine Pitrou
Le 22/09/2020 à 06:36, Micah Kornfield a écrit : > I wanted to give this thread a bump, does the proposal I made below sound > reasonable? It does! Regards Antoine. > > On Sun, Sep 13, 2020 at 9:57 PM Micah Kornfield > wrote: > >> If I read the responses so far it seems like the

[NIGHTLY] Arrow Build Report for Job nightly-2020-09-22-0

2020-09-22 Thread Crossbow
Arrow Build Report for Job nightly-2020-09-22-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-09-22-0 Failed Tasks: - conda-linux-gcc-py36-aarch64: URL:

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-09-22 Thread Kazuaki Ishizaki
Hi Micah, Thank you. Your proposal also sounds reasonable to me. Best Regards, Kazuaki Ishizaki Fan Liya wrote on 2020/09/22 15:51:58: > From: Fan Liya > To: dev , Micah Kornfield > Date: 2020/09/22 15:52 > Subject: [EXTERNAL] Re: [DISCUSS] Big Endian support in Arrow (was: > Re: [Java]

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-09-22 Thread Fan Liya
Hi Micah, Thanks for your summary. Your proposal sounds reasonable to me. Best, Liya Fan On Tue, Sep 22, 2020 at 1:16 PM Micah Kornfield wrote: > I wanted to give this thread a bump, does the proposal I made below sound > reasonable? > > On Sun, Sep 13, 2020 at 9:57 PM Micah Kornfield >