[jira] [Created] (ARROW-7845) [c++] reading list from parquet files

2020-02-12 Thread Mikhail Filimonov (Jira)
Mikhail Filimonov created ARROW-7845:


 Summary: [c++] reading list from parquet files 
 Key: ARROW-7845
 URL: https://issues.apache.org/jira/browse/ARROW-7845
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.15.1
Reporter: Mikhail Filimonov


Currently, the Parquet format reader delivered with apache arrow c++ does not 
support Parquet lists.

See a related issue in https://issues.apache.org/jira/browse/PARQUET-834



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Arrow doesn't have a MapType

2020-02-12 Thread Shawn Yang
Thanks Wes. I was using 0.14 before. BTW, it seems the doc for data types
didn't updated fully. I'll submit a PR for this.

On Thu, Feb 13, 2020 at 12:28 AM Wes McKinney  wrote:

> It was added between 0.15.0 and 0.16.0. Any feedback from using it
> would be welcome
>
>
> https://github.com/apache/arrow/commit/e0c1ffe9c38d1759f1b5311f95864b0e2a406c51
>
> On Wed, Feb 12, 2020 at 5:12 AM Shawn Yang 
> wrote:
> >
> > Thanks François, I didn't find it in pyarrow. I'll check again.
> >
> > On Fri, Feb 7, 2020 at 9:18 PM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > Arrow does have a Map type [1][2][3]. It is represented as a list of
> pairs.
> > >
> > > François
> > >
> > > [1]
> > >
> https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/format/Schema.fbs#L60-L87
> > > [2]
> > >
> https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/cpp/src/arrow/type.h#L691-L719
> > > [3]
> > >
> https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java#L36-L47
> > >
> > > On Fri, Feb 7, 2020 at 3:55 AM Shawn Yang 
> wrote:
> > > >
> > > > Hi guys,
> > > > I'm writing an cross-language row-oriented serialization framework
> mainly
> > > > for java/python for now. I detained many data types and schema,
> field,
> > > such
> > > > as Byte, short, int, long, double, float, map, array, struct,. But
> then I
> > > > find using Arrow schema is a better choice. Since my framework need
> to
> > > > support conversion between my row-format and arrow columnar format.
> If I
> > > do
> > > > all it by myself, I need to support schema conversion and schema
> > > > serialization. Which is not necessary  if I use arrow schema.
> > > >
> > > > But I find that arrow doesn't have a map data type, which is I needed
> > > > exactly. I know I can use struct to mock it or ExtensionType for it.
> But
> > > > it's not very convenient. So I want to know whether   will Map  type
> be
> > > > supported by arrow?
> > > >
> > > > Thanks. Regards
> > >
>


Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-12 Thread Wes McKinney
+1 (binding)

On Tue, Feb 11, 2020 at 4:29 PM Antoine Pitrou  wrote:
>
>
> Ah, you're right, it's PR 6040:
> https://github.com/apache/arrow/pull/6040
>
> Similarly, the C++ implementation is at PR 6026:
> https://github.com/apache/arrow/pull/6026
>
> Regards
>
> Antoine.
>
>
> Le 11/02/2020 à 23:17, Wes McKinney a écrit :
> > hi Antoine, PR 5442 seems to no longer be the right one. Which open PR
> > contains the specification now?
> >
> > On Tue, Feb 11, 2020 at 1:06 PM Antoine Pitrou  wrote:
> >>
> >>
> >> Hello,
> >>
> >> We have been discussing the creation of a minimalist C-based data
> >> interface for applications to exchange Arrow columnar data structures
> >> with each other. Some notable features of this interface include:
> >>
> >> * A small amount of header-only C code can be copied independently into
> >> third-party libraries and downstream applications, no dependencies are
> >> needed even on Arrow C++ itself (notably, it is not required to use
> >> Flatbuffers, though there are trade-offs resulting from this).
> >>
> >> * Low development investment (in other words: limited-scope use cases
> >> can be accomplished with little code), so as to enable C or C++
> >> libraries to export Arrow columnar data with minimal code.
> >>
> >> * Data lifetime management hooks so as to properly handle non-trivial
> >> data sharing (for example passing Arrow columnar data to an async
> >> processing consumer).
> >>
> >> This "C Data Interface" serves different use cases from the
> >> language-independent IPC protocol and trades away a number of features
> >> in the interest of minimalism / simplicity. It is not a replacement for
> >> the IPC protocol and will only be used to interchange in-process data at
> >> C or C++ call sites.
> >>
> >> The PR providing the specification is here:
> >> https://github.com/apache/arrow/pull/5442
> >>
> >> In particular, you can read the spec document here:
> >> https://github.com/pitrou/arrow/blob/doc-c-data-interface2/docs/source/format/CDataInterface.rst
> >>
> >> A fairly comprehensive C++ implementation of this demonstrating its
> >> use is found here:
> >> https://github.com/apache/arrow/pull/5608
> >>
> >> (note that other applications implementing the interface may choose to
> >> only support a few features and thus have far less code to write)
> >>
> >> Please vote to adopt the SPECIFICATION (GitHub PR #5442).
> >>
> >> This vote will be open for at least 72 hours
> >>
> >> [ ] +1 Adopt C Data Interface specification
> >> [ ] +0
> >> [ ] -1 Do not adopt because...
> >>
> >> Thank you
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >> (PS: yes, this is in large part a copy/paste of Wes's previous vote
> >> email :-))


Re: PR Dashboard for Java?

2020-02-12 Thread Bryan Cutler
Works now, thanks! I added a page for Java open PRs
https://cwiki.apache.org/confluence/display/ARROW/Java+Open+Patches

On Tue, Feb 11, 2020 at 12:08 PM Wes McKinney  wrote:

> Weird. Try now
>
> On Tue, Feb 11, 2020 at 1:03 PM Bryan Cutler  wrote:
> >
> > Wes, it doesn't seem to have worked. Could you double check the
> privileges
> > for me (cutlerb)? I'd also like to add something to the verify release
> > candidate page. It's weird, I made an edit before on another page a while
> > ago, not sure what happened.  Thanks!
> >
> > On Mon, Jan 27, 2020 at 2:23 PM Wes McKinney 
> wrote:
> >
> > > Bryan -- I just gave you (cutlerb) Confluence edit privileges. These
> > > have to be explicitly managed on a per-user basis to avoid spam
> > > problems
> > >
> > > On Mon, Jan 27, 2020 at 4:12 PM Bryan Cutler 
> wrote:
> > > >
> > > > Thanks Neal, but it doesn't look like I have confluence privileges.
> > > That's
> > > > fine though, the github interface is easy enough.
> > > >
> > > > On Mon, Jan 27, 2020 at 11:59 AM Neal Richardson <
> > > > neal.p.richard...@gmail.com> wrote:
> > > >
> > > > > If you have confluence privileges, duplicate a page like
> > > > >
> https://cwiki.apache.org/confluence/display/ARROW/Ruby+JIRA+Dashboard
> > > and
> > > > > then edit the Jira query (something like status in open/in
> > > > > progress/reopened, labels = pull-request-available, component =
> java,
> > > > > project = ARROW) if you want to make it Java issues that have pull
> > > requests
> > > > > open.
> > > > >
> > > > > Or you could bookmark
> > > > >
> > > > >
> > >
> https://github.com/apache/arrow/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+%22%5BJava%5D%22
> > > > > or https://github.com/apache/arrow/labels/lang-java
> > > > >
> > > > > Neal
> > > > >
> > > > > On Mon, Jan 27, 2020 at 11:26 AM Bryan Cutler 
> > > wrote:
> > > > >
> > > > > > I saw on Confluence that other Arrow components have PR
> dashboards,
> > > but I
> > > > > > don't see one for Java? I think it would be helpful, is it
> difficult
> > > to
> > > > > add
> > > > > > one for Java? I'm happy to do it if someone could point me in the
> > > right
> > > > > > direction. Thanks!
> > > > > >
> > > > > > Bryan
> > > > > >
> > > > >
> > >
>


[jira] [Created] (ARROW-7844) [R] Parquet list column test is flaky

2020-02-12 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7844:
--

 Summary: [R] Parquet list column test is flaky
 Key: ARROW-7844
 URL: https://issues.apache.org/jira/browse/ARROW-7844
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Neal Richardson
Assignee: Francois Saint-Jacques


See [https://travis-ci.org/ursa-labs/arrow-r-nightly/jobs/649649349#L373-L375] 
for an example on public CI. I was seeing this locally this week but figured 
I'd screwed up my env somehow.

{code}
── 1. Failure: Lists are preserved when writing/reading from Parquet (@test-parq
  `object` not equivalent to `expected`.
  Component "num": Component 1: target is numeric, current is character
{code}

It's not always the same column in the data.frame that is affected. Also 
strange that it's only one column. You'd think that if it were transposing the 
order somehow, you'd get two that were swapped.

The test itself is straightforward 
(https://github.com/apache/arrow/blob/master/r/tests/testthat/test-parquet.R#L124-L137)
 so this is somewhat troubling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Wes McKinney
On Wed, Feb 12, 2020 at 2:37 PM Jacek Pliszka  wrote:
>
> Actually these options still make some sense - but not as much as before.
>
> The use case: unit conversion
>
> Data about prices exported from sql in Decimal(38,10) which uses 128
> bit but the numbers are actually prices which expressed in cents fit
> perfectly in uint32
>
> Having scaling would reduce bandwidth/disk usage by factor of 4.

You'd need to implement a separate function for this since you're
changing the semantics of the cast. I don't think it makes sense to
convert from 123.45 (decimal) to 12345 (uint32) in Cast

> What would be the best approach to such use case?
>
> Would decimal_scale CastOption be OK or should it rather be compute
> 'multiply' kernel ?
>
> BR,
>
> Jacek
>
>
> śr., 12 lut 2020 o 19:32 Jacek Pliszka  napisał(a):
> >
> > OK, then what I proposed does not make sense and I can just copy the
> > solution you pointed out.
> >
> > Thank you,
> >
> > Jacek
> >
> > śr., 12 lut 2020 o 19:27 Wes McKinney  napisał(a):
> > >
> > > On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka  
> > > wrote:
> > > >
> > > > Hi!
> > > >
> > > > ARROW-3329 - we can discuss there.
> > > >
> > > > > It seems like it makes sense to implement both lossless safe casts
> > > > > (when all zeros after the decimal point) and lossy casts (fractional
> > > > > part discarded) from decimal to integer, do I have that right?
> > > >
> > > > Yes, though if I understood your examples are the same case - in both
> > > > cases fractional part is discarded - just it is all 0s in the first
> > > > case.
> > > >
> > > > The key question is whether CastFunctor in cast.cc has access to scale
> > > > of the decimal? If yes how?
> > >
> > > Yes, it's in the type of the input array. Here's a kernel
> > > implementation that uses the TimestampType metadata of the input
> > >
> > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521
> > >
> > > >
> > > > If not - these are the options I've came up with:
> > > >
> > > > Let's assume Decimal128Type value is  n
> > > >
> > > > Then I expect that base call
> > > > .cast('int64') will return  overflow for n beyond int64 values, value 
> > > > otherwise
> > > >
> > > > Option 1:
> > > >
> > > > .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> > > > overflow if it is beyond int64, value otherwise
> > > >
> > > > Option 2:
> > > >
> > > > .cast('int64', bytes_group=0) would return n & 0x
> > > > .cast('int64', bytes_group=1) would return (n >> 64) & 
> > > > 0x
> > > > .cast('int64') would have default value bytes_group=0
> > > >
> > > > Option 3:
> > > >
> > > > cast has no CastOptions but we add  multiply compute kernel and have
> > > > something like this instead:
> > > >
> > > > .compute('multiply', 10**-s).cast('int64')
> > > >
> > > > BR,
> > > >
> > > > Jacek


Re: [C++][Parquet] Is arrow::parquet::FileWriter::WriteColumnChunk intended to be public?

2020-02-12 Thread Wes McKinney
Having them be public was the intention, but it seems that column-wise
writing is not yet fully baked. I think it would be OK to make these
methods private until they can be appropriately tested

On Sat, Feb 8, 2020 at 10:49 PM Micah Kornfield  wrote:
>
> I'm asking because it doesn't seem to do validation that the schemas are
> equivalent between the array being written and the original schema?
>
> It also appears to only be used in unit tests.
>
> Thanks,
> Micah


[jira] [Created] (ARROW-7843) [Ruby] MSYS2 packages needed for arrow-gandiva arrow-cuda

2020-02-12 Thread Dominic Sisneros (Jira)
Dominic Sisneros created ARROW-7843:
---

 Summary: [Ruby] MSYS2 packages needed for arrow-gandiva arrow-cuda
 Key: ARROW-7843
 URL: https://issues.apache.org/jira/browse/ARROW-7843
 Project: Apache Arrow
  Issue Type: Bug
  Components: Ruby
Affects Versions: 0.16.0
 Environment: windows with rubyinstaller
Reporter: Dominic Sisneros


require "gandiva"

table = Arrow::Table.new(:field1 => Arrow::Int32Array.new([1, 2, 3, 4]),
 :field2 => Arrow::Int32Array.new([11, 13, 15, 17]))
schema = table.schema

expression1 = schema.build_expression do |record|
  record.field1 + record.field2
end

expression2 = schema.build_expression do |record, context|
  context.if(record.field1 > record.field2)
.then(record.field1 / record.field2)
.else(record.field1)
end

projector = Gandiva::Projector.new(schema, [expression1, expression2])
table.each_record_batch do |record_batch|
  outputs = projector.evaluate(record_batch)
  puts outputs.collect(&:values)
end

C:\Users\Dominic E Sisneros\source\repos\ruby\try_arrow>ruby gandiva_test2.rb
Traceback (most recent call last):
2: from gandiva_test2.rb:1:in `'
1: from 
c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in `require'
c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in 
`require': cannot load such file -- gandiva (LoadError)
9: from gandiva_test2.rb:1:in `'
8: from 
c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:156:in 
`require'
7: from 
c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:168:in `rescue 
in require'
6: from 
c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:168:in 
`require'
5: from 
c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva.rb:24:in 
`'
4: from 
c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva.rb:28:in 
`'
3: from 
c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva/loader.rb:22:in
 `load'
2: from 
c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:25:in
 `load'
1: from 
c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:37:in
 `load'
c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:37:in
 `require': Typelib file for namespace 'Gandiva' (any version) not found 
(GObjectIntrospection::RepositoryError::TypelibNotFound)




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
Actually these options still make some sense - but not as much as before.

The use case: unit conversion

Data about prices exported from sql in Decimal(38,10) which uses 128
bit but the numbers are actually prices which expressed in cents fit
perfectly in uint32

Having scaling would reduce bandwidth/disk usage by factor of 4.

What would be the best approach to such use case?

Would decimal_scale CastOption be OK or should it rather be compute
'multiply' kernel ?

BR,

Jacek


śr., 12 lut 2020 o 19:32 Jacek Pliszka  napisał(a):
>
> OK, then what I proposed does not make sense and I can just copy the
> solution you pointed out.
>
> Thank you,
>
> Jacek
>
> śr., 12 lut 2020 o 19:27 Wes McKinney  napisał(a):
> >
> > On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka  
> > wrote:
> > >
> > > Hi!
> > >
> > > ARROW-3329 - we can discuss there.
> > >
> > > > It seems like it makes sense to implement both lossless safe casts
> > > > (when all zeros after the decimal point) and lossy casts (fractional
> > > > part discarded) from decimal to integer, do I have that right?
> > >
> > > Yes, though if I understood your examples are the same case - in both
> > > cases fractional part is discarded - just it is all 0s in the first
> > > case.
> > >
> > > The key question is whether CastFunctor in cast.cc has access to scale
> > > of the decimal? If yes how?
> >
> > Yes, it's in the type of the input array. Here's a kernel
> > implementation that uses the TimestampType metadata of the input
> >
> > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521
> >
> > >
> > > If not - these are the options I've came up with:
> > >
> > > Let's assume Decimal128Type value is  n
> > >
> > > Then I expect that base call
> > > .cast('int64') will return  overflow for n beyond int64 values, value 
> > > otherwise
> > >
> > > Option 1:
> > >
> > > .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> > > overflow if it is beyond int64, value otherwise
> > >
> > > Option 2:
> > >
> > > .cast('int64', bytes_group=0) would return n & 0x
> > > .cast('int64', bytes_group=1) would return (n >> 64) & 0x
> > > .cast('int64') would have default value bytes_group=0
> > >
> > > Option 3:
> > >
> > > cast has no CastOptions but we add  multiply compute kernel and have
> > > something like this instead:
> > >
> > > .compute('multiply', 10**-s).cast('int64')
> > >
> > > BR,
> > >
> > > Jacek


[jira] [Created] (ARROW-7842) [Rust] [Parquet] Implement array reader for list type

2020-02-12 Thread Morgan Cassels (Jira)
Morgan Cassels created ARROW-7842:
-

 Summary: [Rust] [Parquet] Implement array reader for list type
 Key: ARROW-7842
 URL: https://issues.apache.org/jira/browse/ARROW-7842
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Morgan Cassels


Currently array reader does not support list or map types. The initial PR 
implementing array reader  https://issues.apache.org/jira/browse/ARROW-4218 
says that list and map support will come later. Is it known when support for 
list types might be implemented?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
OK, then what I proposed does not make sense and I can just copy the
solution you pointed out.

Thank you,

Jacek

śr., 12 lut 2020 o 19:27 Wes McKinney  napisał(a):
>
> On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka  
> wrote:
> >
> > Hi!
> >
> > ARROW-3329 - we can discuss there.
> >
> > > It seems like it makes sense to implement both lossless safe casts
> > > (when all zeros after the decimal point) and lossy casts (fractional
> > > part discarded) from decimal to integer, do I have that right?
> >
> > Yes, though if I understood your examples are the same case - in both
> > cases fractional part is discarded - just it is all 0s in the first
> > case.
> >
> > The key question is whether CastFunctor in cast.cc has access to scale
> > of the decimal? If yes how?
>
> Yes, it's in the type of the input array. Here's a kernel
> implementation that uses the TimestampType metadata of the input
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521
>
> >
> > If not - these are the options I've came up with:
> >
> > Let's assume Decimal128Type value is  n
> >
> > Then I expect that base call
> > .cast('int64') will return  overflow for n beyond int64 values, value 
> > otherwise
> >
> > Option 1:
> >
> > .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> > overflow if it is beyond int64, value otherwise
> >
> > Option 2:
> >
> > .cast('int64', bytes_group=0) would return n & 0x
> > .cast('int64', bytes_group=1) would return (n >> 64) & 0x
> > .cast('int64') would have default value bytes_group=0
> >
> > Option 3:
> >
> > cast has no CastOptions but we add  multiply compute kernel and have
> > something like this instead:
> >
> > .compute('multiply', 10**-s).cast('int64')
> >
> > BR,
> >
> > Jacek


Re: [ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Wes McKinney
On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka  wrote:
>
> Hi!
>
> ARROW-3329 - we can discuss there.
>
> > It seems like it makes sense to implement both lossless safe casts
> > (when all zeros after the decimal point) and lossy casts (fractional
> > part discarded) from decimal to integer, do I have that right?
>
> Yes, though if I understood your examples are the same case - in both
> cases fractional part is discarded - just it is all 0s in the first
> case.
>
> The key question is whether CastFunctor in cast.cc has access to scale
> of the decimal? If yes how?

Yes, it's in the type of the input array. Here's a kernel
implementation that uses the TimestampType metadata of the input

https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521

>
> If not - these are the options I've came up with:
>
> Let's assume Decimal128Type value is  n
>
> Then I expect that base call
> .cast('int64') will return  overflow for n beyond int64 values, value 
> otherwise
>
> Option 1:
>
> .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> overflow if it is beyond int64, value otherwise
>
> Option 2:
>
> .cast('int64', bytes_group=0) would return n & 0x
> .cast('int64', bytes_group=1) would return (n >> 64) & 0x
> .cast('int64') would have default value bytes_group=0
>
> Option 3:
>
> cast has no CastOptions but we add  multiply compute kernel and have
> something like this instead:
>
> .compute('multiply', 10**-s).cast('int64')
>
> BR,
>
> Jacek


[jira] [Created] (ARROW-7841) pyarrow release 0.16.0 breaks `libhdfs.so` loading mechanism

2020-02-12 Thread Jack Fan (Jira)
Jack Fan created ARROW-7841:
---

 Summary: pyarrow release 0.16.0 breaks `libhdfs.so` loading 
mechanism
 Key: ARROW-7841
 URL: https://issues.apache.org/jira/browse/ARROW-7841
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.16.0
Reporter: Jack Fan
 Fix For: 0.15.1


I have my env variable setup correctly according to the pyarrow README
{code:java}
$ ls $HADOOP_HOME/lib/native
libhadoop.a  libhadooppipes.a  libhadoop.so  libhadoop.so.1.0.0  
libhadooputils.a  libhdfs.a  libhdfs.so  libhdfs.so.0.0.0 {code}
Use the following script to reproduce
{code:java}
import pyarrow
pyarrow.hdfs.connect('hdfs://localhost'){code}
With pyarrow version 0.15.1 it is fine.

However, version 0.16.0 will give error
{code:java}
Traceback (most recent call last):
  File "", line 2, in 
  File 
"/home/jackwindows/anaconda2/lib/python2.7/site-packages/pyarrow/hdfs.py", line 
215, in connect
extra_conf=extra_conf)
  File 
"/home/jackwindows/anaconda2/lib/python2.7/site-packages/pyarrow/hdfs.py", line 
40, in __init__
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
  File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
IOError: Unable to load libhdfs: /opt/hadoop/latest/libhdfs.so: cannot open 
shared object file: No such file or directory {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[ARROW-3329] Re: Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
Hi!

ARROW-3329 - we can discuss there.

> It seems like it makes sense to implement both lossless safe casts
> (when all zeros after the decimal point) and lossy casts (fractional
> part discarded) from decimal to integer, do I have that right?

Yes, though if I understood your examples are the same case - in both
cases fractional part is discarded - just it is all 0s in the first
case.

The key question is whether CastFunctor in cast.cc has access to scale
of the decimal? If yes how?

If not - these are the options I've came up with:

Let's assume Decimal128Type value is  n

Then I expect that base call
.cast('int64') will return  overflow for n beyond int64 values, value otherwise

Option 1:

.cast('int64', decimal_scale=s)  would calculate  n/10**s and return
overflow if it is beyond int64, value otherwise

Option 2:

.cast('int64', bytes_group=0) would return n & 0x
.cast('int64', bytes_group=1) would return (n >> 64) & 0x
.cast('int64') would have default value bytes_group=0

Option 3:

cast has no CastOptions but we add  multiply compute kernel and have
something like this instead:

.compute('multiply', 10**-s).cast('int64')

BR,

Jacek


Re: Decimal casting or scaling

2020-02-12 Thread Wes McKinney
hi Jacek,

What is the JIRA issue for this change? In the interest of organizing
the discussion (may make sense to move some of this to that issue)

There are no casts implemented DecimalType at all in [1], either to
decimal or from decimal to anything else.

It seems like it makes sense to implement both lossless safe casts
(when all zeros after the decimal point) and lossy casts (fractional
part discarded) from decimal to integer, do I have that right? I don't
understand your other questions very well so perhaps you can provide
some illustration about what values are to be expected when calling
".cast('int64')" on a Decimal128Type array.

Thanks
Wes

[1]: 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc

On Wed, Feb 12, 2020 at 5:36 AM Jacek Pliszka  wrote:
>
> Hi!
>
> I am interested in having cast from Decimal to Int in pyarrow.
>
> I have couple ideas but I am a newbie so I might be wrong:
>
> Do I understand correctly that the problem lies in the fact that
> CastFunctor knows nothing about decimal scale?
>
> Were there any ideas how to handle this properly?
>
> My ideas are not that great but maybe one of them would be OK:
>
> 1. We can pass 'numeric_scale_shift' or 'decimal_scale_shift' in CastOptions.
> Then while casting, then numbers would be scaled properly.
>
> 2. Pass byte group selector in CastOptions i.e. when casting from
> N*M bytes to N bytes we can pick any of the M groups
>
> 3. Do not modify cast but add scale/multiply compute kernel so we can
> apply it explicitly prior
> to casting
>
> What do you think?  I like 1 most 2 least.
>
> Would any of this solutions be accepted into the code?
>
> I do not want to work on something that would be rejected immediately...
>
> Thanks for any input provided,
>
> Jacek


Re: Arrow doesn't have a MapType

2020-02-12 Thread Wes McKinney
It was added between 0.15.0 and 0.16.0. Any feedback from using it
would be welcome

https://github.com/apache/arrow/commit/e0c1ffe9c38d1759f1b5311f95864b0e2a406c51

On Wed, Feb 12, 2020 at 5:12 AM Shawn Yang  wrote:
>
> Thanks François, I didn't find it in pyarrow. I'll check again.
>
> On Fri, Feb 7, 2020 at 9:18 PM Francois Saint-Jacques <
> fsaintjacq...@gmail.com> wrote:
>
> > Arrow does have a Map type [1][2][3]. It is represented as a list of pairs.
> >
> > François
> >
> > [1]
> > https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/format/Schema.fbs#L60-L87
> > [2]
> > https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/cpp/src/arrow/type.h#L691-L719
> > [3]
> > https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java#L36-L47
> >
> > On Fri, Feb 7, 2020 at 3:55 AM Shawn Yang  wrote:
> > >
> > > Hi guys,
> > > I'm writing an cross-language row-oriented serialization framework mainly
> > > for java/python for now. I detained many data types and schema, field,
> > such
> > > as Byte, short, int, long, double, float, map, array, struct,. But then I
> > > find using Arrow schema is a better choice. Since my framework need to
> > > support conversion between my row-format and arrow columnar format. If I
> > do
> > > all it by myself, I need to support schema conversion and schema
> > > serialization. Which is not necessary  if I use arrow schema.
> > >
> > > But I find that arrow doesn't have a map data type, which is I needed
> > > exactly. I know I can use struct to mock it or ExtensionType for it. But
> > > it's not very convenient. So I want to know whether   will Map  type be
> > > supported by arrow?
> > >
> > > Thanks. Regards
> >


[jira] [Created] (ARROW-7840) [Java] [Integration] Java executables fail

2020-02-12 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7840:
-

 Summary: [Java] [Integration] Java executables fail
 Key: ARROW-7840
 URL: https://issues.apache.org/jira/browse/ARROW-7840
 Project: Apache Arrow
  Issue Type: Bug
  Components: Integration, Java
Reporter: Antoine Pitrou
 Fix For: 1.0.0


When trying to run integration tests using {{docker-compose run 
conda-integration}}, I always get failures during the Java tests:
{code}
RuntimeError: Command failed: ['java', 
'-Dio.netty.tryReflectionSetAccessible=true', '-cp', 
'/arrow/java/tools/target/arrow-tools-1.0.0-SNAPSHOT-jar-with-dependencies.jar',
 'org.apache.arrow.tools.StreamToFile', 
'/tmp/tmpqbkrmpo1/e75ed336_simple.producer_file_as_stream', 
'/tmp/tmpqbkrmpo1/e75ed336_simple.consumer_stream_as_file']
With output:
--
15:57:01.194 [main] DEBUG io.netty.util.internal.logging.InternalLoggerFactory 
- Using SLF4J as the default logging framework
15:57:01.196 [main] DEBUG io.netty.util.ResourceLeakDetector - 
-Dio.netty.leakDetection.level: simple
15:57:01.196 [main] DEBUG io.netty.util.ResourceLeakDetector - 
-Dio.netty.leakDetection.targetRecords: 4
15:57:01.208 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
-Dio.netty.noUnsafe: false
15:57:01.209 [main] DEBUG io.netty.util.internal.PlatformDependent0 - Java 
version: 8
15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
sun.misc.Unsafe.theUnsafe: available
15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
sun.misc.Unsafe.copyMemory: available
15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
java.nio.Buffer.address: available
15:57:01.210 [main] DEBUG io.netty.util.internal.PlatformDependent0 - direct 
buffer constructor: available
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
java.nio.Bits.unaligned: available, true
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable prior to 
Java9
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent0 - 
java.nio.DirectByteBuffer.(long, int): available
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
sun.misc.Unsafe: available
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.tmpdir: /tmp (java.io.tmpdir)
15:57:01.211 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.bitMode: 64 (sun.arch.data.model)
15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.noPreferDirect: false
15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.maxDirectMemory: 11252269056 bytes
15:57:01.212 [main] DEBUG io.netty.util.internal.PlatformDependent - 
-Dio.netty.uninitializedArrayAllocationThreshold: -1
15:57:01.213 [main] DEBUG io.netty.util.internal.CleanerJava6 - 
java.nio.ByteBuffer.cleaner(): available
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.numHeapArenas: 48
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.numDirectArenas: 48
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.pageSize: 8192
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.maxOrder: 11
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.chunkSize: 16777216
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.tinyCacheSize: 512
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.smallCacheSize: 256
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.normalCacheSize: 64
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.maxCachedBufferCapacity: 32768
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.cacheTrimInterval: 8192
15:57:01.213 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - 
-Dio.netty.allocator.useCacheForAllThreads: true
15:57:01.216 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - 
-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
15:57:01.216 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - 
-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
15:57:01.228 [main] DEBUG io.netty.buffer.AbstractByteBuf - 
-Dio.netty.buffer.bytebuf.checkAccessible: true
15:57:01.228 [main] DEBUG io.netty.util.ResourceLeakDetectorFactory - Loaded 
default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@71bc1ae4
15:57:01.242 [main] DEBUG org.apache.arrow.vector.ipc.ReadChannel - Reading 
buffer with size: 4
15:57:01.242 [main] DEBUG org.apache.arrow.vector.ipc.ReadChannel - Reading 
buffer with size: 4
15:57:01.242 [main] DEBUG 

[NIGHTLY] Arrow Build Report for Job nightly-2020-02-12-0

2020-02-12 Thread Crossbow


Arrow Build Report for Job nightly-2020-02-12-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0

Failed Tasks:
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.7-turbodbc-master
- wheel-osx-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-travis-wheel-osx-cp27m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-conda-win-vs2015-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-azure-debian-stretch
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-travis-gandiva-jar-osx
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-travis-gandiva-jar-trusty
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-travis-macos-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-12-0-circle-test-conda-python-3.7
- test-conda-python-3.8-dask-master:
  URL: 

Decimal casting or scaling

2020-02-12 Thread Jacek Pliszka
Hi!

I am interested in having cast from Decimal to Int in pyarrow.

I have couple ideas but I am a newbie so I might be wrong:

Do I understand correctly that the problem lies in the fact that
CastFunctor knows nothing about decimal scale?

Were there any ideas how to handle this properly?

My ideas are not that great but maybe one of them would be OK:

1. We can pass 'numeric_scale_shift' or 'decimal_scale_shift' in CastOptions.
Then while casting, then numbers would be scaled properly.

2. Pass byte group selector in CastOptions i.e. when casting from
N*M bytes to N bytes we can pick any of the M groups

3. Do not modify cast but add scale/multiply compute kernel so we can
apply it explicitly prior
to casting

What do you think?  I like 1 most 2 least.

Would any of this solutions be accepted into the code?

I do not want to work on something that would be rejected immediately...

Thanks for any input provided,

Jacek


Re: Arrow doesn't have a MapType

2020-02-12 Thread Shawn Yang
Thanks François, I didn't find it in pyarrow. I'll check again.

On Fri, Feb 7, 2020 at 9:18 PM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> Arrow does have a Map type [1][2][3]. It is represented as a list of pairs.
>
> François
>
> [1]
> https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/format/Schema.fbs#L60-L87
> [2]
> https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/cpp/src/arrow/type.h#L691-L719
> [3]
> https://github.com/apache/arrow/blob/762202418541e843923b8cae640d15b4952a0af6/java/vector/src/main/java/org/apache/arrow/vector/complex/MapVector.java#L36-L47
>
> On Fri, Feb 7, 2020 at 3:55 AM Shawn Yang  wrote:
> >
> > Hi guys,
> > I'm writing an cross-language row-oriented serialization framework mainly
> > for java/python for now. I detained many data types and schema, field,
> such
> > as Byte, short, int, long, double, float, map, array, struct,. But then I
> > find using Arrow schema is a better choice. Since my framework need to
> > support conversion between my row-format and arrow columnar format. If I
> do
> > all it by myself, I need to support schema conversion and schema
> > serialization. Which is not necessary  if I use arrow schema.
> >
> > But I find that arrow doesn't have a map data type, which is I needed
> > exactly. I know I can use struct to mock it or ExtensionType for it. But
> > it's not very convenient. So I want to know whether   will Map  type be
> > supported by arrow?
> >
> > Thanks. Regards
>


[jira] [Created] (ARROW-7839) [Python][Dataset] Add IPC format to python bindings

2020-02-12 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7839:


 Summary: [Python][Dataset] Add IPC format to python bindings
 Key: ARROW-7839
 URL: https://issues.apache.org/jira/browse/ARROW-7839
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche


The C++ / R was done in ARROW-7415, we should add bindings for it in Python as 
well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7838) [C++] Installed plasma-store-server fails finding Boost

2020-02-12 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7838:
-

 Summary: [C++] Installed plasma-store-server fails finding Boost
 Key: ARROW-7838
 URL: https://issues.apache.org/jira/browse/ARROW-7838
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, C++ - Plasma
Reporter: Antoine Pitrou


In my build directory I have:
{code}
$ ldd build-test/debug/plasma-store-server 
linux-vdso.so.1 (0x7ffc0001f000)
libplasma.so.100 => 
/home/antoine/arrow/dev/cpp/build-test/debug/libplasma.so.100 
(0x7efbff629000)
libarrow_cuda.so.100 => 
/home/antoine/arrow/dev/cpp/build-test/debug/libarrow_cuda.so.100 
(0x7efbff58d000)
libarrow.so.100 => 
/home/antoine/arrow/dev/cpp/build-test/debug/libarrow.so.100 
(0x7efbfcbae000)
libssl.so.1.1 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libssl.so.1.1 (0x7efbfcb1e000)
libcrypto.so.1.1 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libcrypto.so.1.1 (0x7efbfc87)
libaws-cpp-sdk-config.so => 
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-cpp-sdk-config.so 
(0x7efbfc6be000)
libaws-cpp-sdk-transfer.so => 
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-cpp-sdk-transfer.so 
(0x7efbff557000)
libaws-cpp-sdk-s3.so => 
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-cpp-sdk-s3.so 
(0x7efbfc478000)
libaws-cpp-sdk-core.so => 
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-cpp-sdk-core.so 
(0x7efbfc37b000)
libaws-c-event-stream.so.0unstable => 
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-c-event-stream.so.0unstable 
(0x7efbff54e000)
libaws-c-common.so.0unstable => 
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-c-common.so.0unstable 
(0x7efbff52d000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x7efbfbfa2000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x7efbfbd83000)
libaws-checksums.so => 
/home/antoine/miniconda3/envs/pyarrow/lib/libaws-checksums.so 
(0x7efbff51d000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x7efbfbb7b000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x7efbfb977000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 
(0x7efbfadd7000)
libstdc++.so.6 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libstdc++.so.6 (0x7efbfac63000)
libgcc_s.so.1 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libgcc_s.so.1 (0x7efbff507000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7efbfa872000)
/lib64/ld-linux-x86-64.so.2 (0x7efbff4d7000)
libbz2.so.1.0 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libbz2.so.1.0 (0x7efbfa85e000)
liblz4.so.1 => /home/antoine/miniconda3/envs/pyarrow/lib/liblz4.so.1 
(0x7efbfa829000)
libsnappy.so.1 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libsnappy.so.1 (0x7efbfa81e000)
libz.so.1 => /home/antoine/miniconda3/envs/pyarrow/lib/libz.so.1 
(0x7efbfa804000)
libzstd.so.1 => /home/antoine/miniconda3/envs/pyarrow/lib/libzstd.so.1 
(0x7efbfa748000)
libboost_filesystem.so.1.68.0 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libboost_filesystem.so.1.68.0 
(0x7efbfa72a000)
libboost_system.so.1.68.0 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libboost_system.so.1.68.0 
(0x7efbff4fe000)
libcurl.so.4 => 
/home/antoine/miniconda3/envs/pyarrow/lib/./libcurl.so.4 (0x7efbfa6a4000)
libnvidia-fatbinaryloader.so.390.116 => 
/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.390.116 
(0x7efbfa456000)
libssh2.so.1 => 
/home/antoine/miniconda3/envs/pyarrow/lib/././libssh2.so.1 (0x7efbfa423000)
libgssapi_krb5.so.2 => 
/home/antoine/miniconda3/envs/pyarrow/lib/././libgssapi_krb5.so.2 
(0x7efbfa3d4000)
libkrb5.so.3 => 
/home/antoine/miniconda3/envs/pyarrow/lib/././libkrb5.so.3 (0x7efbfa2fd000)
libk5crypto.so.3 => 
/home/antoine/miniconda3/envs/pyarrow/lib/././libk5crypto.so.3 
(0x7efbfa2de000)
libcom_err.so.3 => 
/home/antoine/miniconda3/envs/pyarrow/lib/././libcom_err.so.3 
(0x7efbfa2d6000)
libkrb5support.so.0 => 
/home/antoine/miniconda3/envs/pyarrow/lib/./././libkrb5support.so.0 
(0x7efbfa2c8000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 
(0x7efbfa0ad000)
{code}

However, once installed it seems the Boost resolution fails:
{code}
$ ldd /home/antoine/miniconda3/envs/pyarrow/bin/plasma-store-server
linux-vdso.so.1 (0x7ffc0001f000)
libplasma.so.100 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libplasma.so.100 (0x7efbff629000)
libarrow_cuda.so.100 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libarrow_cuda.so.100 
(0x7efbff58d000)
libarrow.so.100 => 
/home/antoine/miniconda3/envs/pyarrow/lib/libarrow.so.100 (0x7efbfcbae000)