Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-08 Thread Benjamin Kietzman
As a workaround, the "fill_null" compute function can be used to replace nulls with nans: >>> nan = pa.scalar(np.NaN, type=pa.float64()) >>> pa.Array.from_pandas(s).fill_null(nan).to_pandas() On Tue, Jun 8, 2021, 16:15 Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Hi Li, > >

Re: C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only

2021-06-08 Thread Sutou Kouhei
Hi, Could you try building Apache Arrow C++ with -DCMAKE_BUILD_TYPE=Debug and get backtrace again? It will show the source location on segmentation fault. Thanks, -- kou In "C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only" on Tue, 8 Jun 2021 12:01:27 -0700, Rares

Re: Arrow sync call May 26 at 12:00 US/Eastern, 16:00 UTC

2021-06-08 Thread Neal Richardson
Belated notes from the call last time: Attendees: Nate Bauernfeind Ian Cook Nic Crane James Duong Tiffany Lam Jorge Cardoso Leitão Rok Mihevc Gyan Prakash Neal Richardson Discussion: - 4.0.1 patch release: vote passed, doing the post release tasks - FlightSQL: James and Tiffany picking back up

Arrow sync call June 9 at 12:00 US/Eastern, 16:00 UTC

2021-06-08 Thread Neal Richardson
Hi all, Our biweekly call is tomorrow at https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will be shared with the mailing list afterward. Neal

Re: [C++][Discuss] Switch to C++17

2021-06-08 Thread Jonathan Keane
I've been digging a bit to try and put numbers on those users the Neal mentions. Specifically, we know that requiring C++17 will mean that R users on windows using versions of R before 4.0.0 will not be able to compile/install arrow. Although R version 3.6 is no longer supported by CRAN [1], many

Re: [C++][Discuss] Switch to C++17

2021-06-08 Thread Neal Richardson
I'm guessing there hasn't been opposition on this thread because the users that this might affect aren't following this mailing list. I'd be interested to see which other major C++ projects out there have bumped their requirement to C++17, and how that experience was for everyone--the user

Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-08 Thread Joris Van den Bossche
Hi Li, It's correct that arrow uses "None" for null values when converting a string array to numpy / pandas. As far as I am aware, there is currently no option to control that (and to make it use np.nan instead), and I am not sure there would be much interest in adding such an option. Now, I

Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-08 Thread Jorge Cardoso Leitão
Semantically, a NaN is defined according to the IEEE_754 for floating points, while a null represents any value whose value is undefined, unknown, etc. An important set of problems that arrow solves is that it has a native representation for null values (independent of NaNs): arrow's in-memory

Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-08 Thread Li Jin
Hello! Apologies if this has been brought before. I'd like to get devs' thoughts on this potential inconsistency of "what are the python objects for null values" between pandas and pyarrow. Demonstrated with the following example: (1) pandas seems to use "np.NaN" to represent a missing value

Re: [C++] Adopting a library for (distributed) tracing

2021-06-08 Thread David Li
I'll have to do some more digging into that and get back to you. So far I've been using a quick-and-dirty tool that I whipped up using Vega-Lite but that's probably not something we want to maintain. I tried the Chrome trace viewer ("Catapult") but it's not quite built for this kind of trace; I

Re: [C++] Adopting a library for (distributed) tracing

2021-06-08 Thread Weston Pace
FWIW, I tried this out yesterday since I was profiling the execution of the async API reader. It worked great so +1 from me on that basis. I did struggle finding a good simple visualization tool. Do you have any good recommendations on that front? On Mon, Jun 7, 2021 at 10:50 AM David Li

C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only

2021-06-08 Thread Rares Vernica
Hello, We recently migrated our C++ Arrow code from 0.16 to 3.0.0. The code works fine on Ubuntu, but we get a segmentation fault in CentOS while reading Arrow Record Batch files. We can successfully read the files from Python or Ubuntu so the files and the writer are fine. We use Record Batch

Re: [C++][DISCUSS] Implementing interpreted (non-compiled) tests for compute functions

2021-06-08 Thread Benjamin Kietzman
I've added https://issues.apache.org/jira/browse/ARROW-13013 to track moving kernel unit tests to Python since that seems easily doable and worthwhile On Sun, May 16, 2021 at 3:35 PM Wes McKinney wrote: > I agree there are pros and cons here (up front investment hopefully > yielding future

Re: [ANNOUNCE] New Arrow committer: Kazuaki Ishizaki

2021-06-08 Thread Kazuaki Ishizaki
Thanks all for your messages and helps. I will work for the community together. Best regards, Kazuaki Ishizaki Eduardo Ponce wrote on 2021/06/09 00:03:35: > From: Eduardo Ponce > To: dev@arrow.apache.org > Date: 2021/06/09 00:04 > Subject: [EXTERNAL] Re: [ANNOUNCE] New Arrow committer:

Re: [ANNOUNCE] New Arrow committer: Kazuaki Ishizaki

2021-06-08 Thread Eduardo Ponce
Congratulations!! ~Eduardo On Mon, Jun 7, 2021 at 11:06 PM Fan Liya wrote: > Congratulations, Kazuaki! > > Best, > Liya Fan > > On Tue, Jun 8, 2021 at 7:59 AM Rok Mihevc wrote: > > > Congrats! > > > > On Tue, Jun 8, 2021 at 1:36 AM Micah Kornfield > > wrote: > > > > > Congrats! > > > > > >

Re: Arrow Dataset API on Ceph

2021-06-08 Thread Jayjeet Chakraborty
Hi Yibo, Thanks a lot for your interest in our work. Please refer to this [1] guide to deploy a complete environment on a cluster of nodes. Regarding your comment about a Ceph patch, the arrow object class that we implement is actually a plugin and does not require the Ceph source tree for

Re: [C++][Discuss] Switch to C++17

2021-06-08 Thread Antoine Pitrou
Hello, Note the change in the message topic :-) We now have a draft PR up to switch the C++ standard level to C++17. This allows very nice simplifications in the code, especially the use of elegant constructs that can replace some cumbersome uses of std::enable_if, SFINAE and other pain points.

Complex Number support in Arrow

2021-06-08 Thread Simon Perkins
Greetings Apache Dev Mailing List I'm interested in adding complex number support to Arrow. The use case is Radio Astronomy data, which is represented by complex values. xref https://issues.apache.org/jira/browse/ARROW-638 xref https://github.com/apache/arrow/pull/10452 It's fairly easy to