Re: [DISCUSS][Format] Time Interval Changes

2019-03-22 Thread Micah Kornfield
Hi arrow-dev,
I just wanted to bump this thread to see if anyone wanted to comment or
discuss a path forward.

If no one chimes in by Monday evening, could I ask a PMC member to start a
vote on Tuesday (I believe a member of the PMC needs to initiate a vote?)

I will implement the C++ side once there is consensus around the change to
the format.

Thanks,
Micah

On Tue, Mar 19, 2019 at 12:13 AM Micah Kornfield 
wrote:

> Hi Arrow Dev,
> Based on the recent thread on discussing and voting on changes to files
> under format, I'd figure I'd try see how the process works for changes to
> Schema.fbs to close out lingering time interval issues.  In particular,
> ARROW-352 (Interval(DAY_TIME) has no unit) and ARROW-835 (Add Timedelta
> type to describe time intervals).
>
> I submitted a PR [1] that introduces a new DurationType that models
> (sub)seconds (excluding leap seconds) as a 8-byte integer type.  Some of
> these issues have been discussed previously, the most recent thread was
> within the last month [2].
>
> The reason for creating a new type is to avoid breaking changes with
> existing types (in particular Interval[DAY_TIME] in Java).I think
> things worth discussing are:
>
> 1.  Is this a desirable change in principle?
> 2.  Naming: is DurationInterval a good name (should it be TimeDelta)?
> 3.  New Type: Should this be collapsed as a new enum on Interval (because
> it excludes leap-seconds, I think it still technically falls into the class
> of Calendar like objects).
>
> Please feel free to add items for discussion.
>
> I'm not sure the typical time that discussions are held open for, but it
> would be great if we could try to get to a consensus sometime soon (and
> then schedule a vote).  Maybe early next week is a good goal to aim for?
>
> Thanks,
> Micah
>
>
> [1] https://github.com/apache/arrow/pull/3644
> [2]
> https://lists.apache.org/thread.html/0e606a6afd2332b4ae5b4382e533bea309c790ea71c05047cf983372@%3Cdev.arrow.apache.org%3E
>


Re: Arrow Flight protocol/API questions

2019-03-22 Thread David Li
Sorry about that! It should be enabled now, let me know if it doesn't work.

Best,
David

On 3/22/19, Antoine Pitrou  wrote:
>
> I second this request.
>
> Regards
>
> Antoine.
>
>
> On Fri, 22 Mar 2019 15:26:26 -0700
> Jacques Nadeau  wrote:
>> Hey David, thanks for sharing this. Can you add comment capability to the
>> doc for reviewers?
>>
>> thanks,
>> Jacques
>>
>>
>> On Fri, Mar 22, 2019 at 1:29 PM David Li  wrote:
>>
>> > Hi all,
>> >
>> > To bring this back up again, we've started experimenting with Flight
>> > for real now, and have some proposals. Including the justifications,
>> > they're a little long, so I've put them on a linked Google doc:
>> >
>> > https://docs.google.com/document/d/1aIVZ8SD5dMZXHTCeEY9PoNAwyuUgG-UEjmd3zfs1PYM/edit?usp=sharing
>> >
>> > In short, these proposals try to add the minimal amount in the
>> > APIs/protocol to be "production-ready" based on what we've seen so
>> > far. Originally, I brought up the idea of adding "escape hatches" to
>> > get at the underlying RPC framework objects, but after taking a stab
>> > at this, it isn't feasible in Python, making it kind of pointless as a
>> > solution. I'd like to avoid making Flight into a full-on RPC framework
>> > in and of itself, with an eye for portability in the future. We'd be
>> > willing to work on implementations of all these to get the ball
>> > rolling.
>> >
>> > Many of these could be solved in the meantime with reasonable defaults
>> > - but I think inevitably users will need to tweak lower-level details
>> > as things hit production, and generally reasonable defaults won't
>> > apply in every case.
>> >
>> > Finally, thanks to all who have been reviewing/working on Flight so
>> > far, I'm quite excited to start using it for real.
>> >
>> > Best,
>> > David
>> >
>>
>
>
>
>


Re: Arrow Flight protocol/API questions

2019-03-22 Thread Antoine Pitrou


I second this request.

Regards

Antoine.


On Fri, 22 Mar 2019 15:26:26 -0700
Jacques Nadeau  wrote:
> Hey David, thanks for sharing this. Can you add comment capability to the
> doc for reviewers?
> 
> thanks,
> Jacques
> 
> 
> On Fri, Mar 22, 2019 at 1:29 PM David Li  wrote:
> 
> > Hi all,
> >
> > To bring this back up again, we've started experimenting with Flight
> > for real now, and have some proposals. Including the justifications,
> > they're a little long, so I've put them on a linked Google doc:
> >
> > https://docs.google.com/document/d/1aIVZ8SD5dMZXHTCeEY9PoNAwyuUgG-UEjmd3zfs1PYM/edit?usp=sharing
> >
> > In short, these proposals try to add the minimal amount in the
> > APIs/protocol to be "production-ready" based on what we've seen so
> > far. Originally, I brought up the idea of adding "escape hatches" to
> > get at the underlying RPC framework objects, but after taking a stab
> > at this, it isn't feasible in Python, making it kind of pointless as a
> > solution. I'd like to avoid making Flight into a full-on RPC framework
> > in and of itself, with an eye for portability in the future. We'd be
> > willing to work on implementations of all these to get the ball
> > rolling.
> >
> > Many of these could be solved in the meantime with reasonable defaults
> > - but I think inevitably users will need to tweak lower-level details
> > as things hit production, and generally reasonable defaults won't
> > apply in every case.
> >
> > Finally, thanks to all who have been reviewing/working on Flight so
> > far, I'm quite excited to start using it for real.
> >
> > Best,
> > David
> >  
> 





Re: Arrow Flight protocol/API questions

2019-03-22 Thread Jacques Nadeau
Hey David, thanks for sharing this. Can you add comment capability to the
doc for reviewers?

thanks,
Jacques


On Fri, Mar 22, 2019 at 1:29 PM David Li  wrote:

> Hi all,
>
> To bring this back up again, we've started experimenting with Flight
> for real now, and have some proposals. Including the justifications,
> they're a little long, so I've put them on a linked Google doc:
>
> https://docs.google.com/document/d/1aIVZ8SD5dMZXHTCeEY9PoNAwyuUgG-UEjmd3zfs1PYM/edit?usp=sharing
>
> In short, these proposals try to add the minimal amount in the
> APIs/protocol to be "production-ready" based on what we've seen so
> far. Originally, I brought up the idea of adding "escape hatches" to
> get at the underlying RPC framework objects, but after taking a stab
> at this, it isn't feasible in Python, making it kind of pointless as a
> solution. I'd like to avoid making Flight into a full-on RPC framework
> in and of itself, with an eye for portability in the future. We'd be
> willing to work on implementations of all these to get the ball
> rolling.
>
> Many of these could be solved in the meantime with reasonable defaults
> - but I think inevitably users will need to tweak lower-level details
> as things hit production, and generally reasonable defaults won't
> apply in every case.
>
> Finally, thanks to all who have been reviewing/working on Flight so
> far, I'm quite excited to start using it for real.
>
> Best,
> David
>


[Python] The next manylinux specification

2019-03-22 Thread Antoine Pitrou


For those who are interested in discussing it:

https://discuss.python.org/t/the-next-manylinux-specification/1043

Regards

Antoine.


Re: Arrow Flight protocol/API questions

2019-03-22 Thread David Li
Hi all,

To bring this back up again, we've started experimenting with Flight
for real now, and have some proposals. Including the justifications,
they're a little long, so I've put them on a linked Google doc:
https://docs.google.com/document/d/1aIVZ8SD5dMZXHTCeEY9PoNAwyuUgG-UEjmd3zfs1PYM/edit?usp=sharing

In short, these proposals try to add the minimal amount in the
APIs/protocol to be "production-ready" based on what we've seen so
far. Originally, I brought up the idea of adding "escape hatches" to
get at the underlying RPC framework objects, but after taking a stab
at this, it isn't feasible in Python, making it kind of pointless as a
solution. I'd like to avoid making Flight into a full-on RPC framework
in and of itself, with an eye for portability in the future. We'd be
willing to work on implementations of all these to get the ball
rolling.

Many of these could be solved in the meantime with reasonable defaults
- but I think inevitably users will need to tweak lower-level details
as things hit production, and generally reasonable defaults won't
apply in every case.

Finally, thanks to all who have been reviewing/working on Flight so
far, I'm quite excited to start using it for real.

Best,
David


tensorflow-io Arrow Datasets and thoughts on support for tensor columns

2019-03-22 Thread Bryan Cutler
Hi All,

Recently I have been working with the TensorFlow SIG-IO community to
introduce Apache Arrow based Datasets for bringing Arrow data into
TensorFlow. SIG-IO is a community maintained repository focused on
input/output support for TF, see https://github.com/tensorflow/io (a lot of
formats from contrib/ ended up here).  Since it is community driven, if
anyone is interested, participation is highly encouraged!

I'm bringing this up for a couple reasons. First, I want to make sure that
this stays in-line with any related efforts within the Arrow project and
welcome any feedback. Secondly, the initial response has been great and
people are excited about using Arrow and looking to use it in other areas
of TF, but I've noticed there has been some confusion about how Arrow
handles tensor data. Specifically, it gets assumed that tensors could be
part of a RecordBatch and could be readily used in an Arrow stream.

I know we have talked about making tensors a logical type for columnar data
before in
https://lists.apache.org/thread.html/6cc86d50d92dbd21d6fc34e34485afb3cab4956fbc0d61ff9b99ea27@%3Cdev.arrow.apache.org%3E
and there is a JIRA ARROW-1614
, but since there is work
needed to fully support the current spec for 1.0, I don't think it has
moved forward much. I'm wondering if maybe now is a better time to start
working on this?  I think having built-in support for tensor columns would
really help to increase adoption of Arrow in frameworks that use tensor
data. What are other people's thoughts?

Best Regards,
Bryan


Re: Creating Arrays from builders using bitmasks

2019-03-22 Thread Felipe Aramburu
Got it. I think I am just going to use arraydata::make for this for now.
Thanks a bundle!

Felipe

On Fri, Mar 22, 2019 at 8:43 AM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> It is not frowned upon to use the ArrayData::Make classes, you just have to
> ensure the order of buffers matches what the specialized Array class
> expects (also matching the type you're passing to ArrayData). I'd say it is
> the "preferred" way if your data is already in the required layout, and the
> best way if you want effective zero-copy.
>
> I created https://issues.apache.org/jira/browse/ARROW-4999 to track
> improving the documentation.
>
> François
>
> On Fri, Mar 22, 2019 at 11:26 AM Felipe Aramburu 
> wrote:
>
> > Is there a way to use a builder to be able to provide a bit per value LSB
> > as is described in the documentation? I have this already and it seems
> > silly to convert it to something else so that arrow can then make it the
> > same format as what I had to begin with. I know there is the ArrayData
> > class that has a make function that seems to allow me to do this but I am
> > trying to do things as the documentation suggest which I assumed was the
> > preferred method of doing this.
> >
> >
> >
> > On Fri, Mar 22, 2019 at 8:13 AM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > Actually, this specific method seems to use a byte per value as you
> > > questioned. I think it's worth adding documentation and an explicit
> > warning
> > > if it confused me. I'll let bkietz chime in to comment on the usage.
> > >
> > > François
> > >
> > > On Fri, Mar 22, 2019 at 10:57 AM Francois Saint-Jacques <
> > > fsaintjacq...@gmail.com> wrote:
> > >
> > > > Hello Felipe,
> > > >
> > > > it's a bit per value as per memory layout documentation.
> > > >
> > > > François
> > > >
> > > >
> > > >
> > > > On Fri, Mar 22, 2019 at 10:48 AM Felipe Aramburu <
> fel...@blazingdb.com
> > >
> > > > wrote:
> > > >
> > > >> In the builder base class I see this api
> > > >>
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/arrow/blob/ad1697e5d25eeaff5630421f55b0120f45cf0ce1/cpp/src/arrow/array/builder_base.h#L149
> > > >>
> > > >> // Vector append. Treat each zero byte as a nullzero. If
> > valid_bytes
> > > >> is
> > > >> null
> > > >> // assume all of length bits are valid.
> > > >> void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t
> > > length)
> > > >>
> > > >> Is valid_bytes an allocation of size (int8_t) * length, using an
> > entire
> > > >> byte to indicate validity for each element in the array or is this a
> > > null
> > > >> bitmask where in each byte in valid_bytes encodes 8 values, one per
> > bit?
> > > >>
> > > >> If this is using a byte per value is there an approved way of using
> a
> > > >> builder to initialize an array using the memory layout described
> here
> > > >> https://arrow.apache.org/docs/memory_layout.html#null-bitmaps?
> > > >>
> > > >
> > >
> >
>


[jira] [Created] (ARROW-5000) [Python] Fix deprecation warning from setup.py

2019-03-22 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5000:
---

 Summary: [Python] Fix deprecation warning from setup.py
 Key: ARROW-5000
 URL: https://issues.apache.org/jira/browse/ARROW-5000
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.14.0
Reporter: Wes McKinney


I noticed on Python 3.7 today

{code}
Bundling includes: debug/include
setup.py:441: DeprecationWarning: SO is deprecated, use EXT_SUFFIX
  suffix = sysconfig.get_config_var('SO')
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Creating Arrays from builders using bitmasks

2019-03-22 Thread Francois Saint-Jacques
It is not frowned upon to use the ArrayData::Make classes, you just have to
ensure the order of buffers matches what the specialized Array class
expects (also matching the type you're passing to ArrayData). I'd say it is
the "preferred" way if your data is already in the required layout, and the
best way if you want effective zero-copy.

I created https://issues.apache.org/jira/browse/ARROW-4999 to track
improving the documentation.

François

On Fri, Mar 22, 2019 at 11:26 AM Felipe Aramburu 
wrote:

> Is there a way to use a builder to be able to provide a bit per value LSB
> as is described in the documentation? I have this already and it seems
> silly to convert it to something else so that arrow can then make it the
> same format as what I had to begin with. I know there is the ArrayData
> class that has a make function that seems to allow me to do this but I am
> trying to do things as the documentation suggest which I assumed was the
> preferred method of doing this.
>
>
>
> On Fri, Mar 22, 2019 at 8:13 AM Francois Saint-Jacques <
> fsaintjacq...@gmail.com> wrote:
>
> > Actually, this specific method seems to use a byte per value as you
> > questioned. I think it's worth adding documentation and an explicit
> warning
> > if it confused me. I'll let bkietz chime in to comment on the usage.
> >
> > François
> >
> > On Fri, Mar 22, 2019 at 10:57 AM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > Hello Felipe,
> > >
> > > it's a bit per value as per memory layout documentation.
> > >
> > > François
> > >
> > >
> > >
> > > On Fri, Mar 22, 2019 at 10:48 AM Felipe Aramburu  >
> > > wrote:
> > >
> > >> In the builder base class I see this api
> > >>
> > >>
> > >>
> >
> https://github.com/apache/arrow/blob/ad1697e5d25eeaff5630421f55b0120f45cf0ce1/cpp/src/arrow/array/builder_base.h#L149
> > >>
> > >> // Vector append. Treat each zero byte as a nullzero. If
> valid_bytes
> > >> is
> > >> null
> > >> // assume all of length bits are valid.
> > >> void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t
> > length)
> > >>
> > >> Is valid_bytes an allocation of size (int8_t) * length, using an
> entire
> > >> byte to indicate validity for each element in the array or is this a
> > null
> > >> bitmask where in each byte in valid_bytes encodes 8 values, one per
> bit?
> > >>
> > >> If this is using a byte per value is there an approved way of using a
> > >> builder to initialize an array using the memory layout described here
> > >> https://arrow.apache.org/docs/memory_layout.html#null-bitmaps?
> > >>
> > >
> >
>


[jira] [Created] (ARROW-4999) [Doc] Add examples on how to construct with ArrayData::Make instead of builder classes

2019-03-22 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4999:
-

 Summary: [Doc] Add examples on how to construct with 
ArrayData::Make instead of builder classes
 Key: ARROW-4999
 URL: https://issues.apache.org/jira/browse/ARROW-4999
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Francois Saint-Jacques
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Creating Arrays from builders using bitmasks

2019-03-22 Thread Wes McKinney
hi Felipe -- note you don't have to use the builder classes when you
have the exact memory layout already, you can wrap you rmemory in
arrow::Buffer and construct the arrays directly.

I think it would be useful to add APIs for appending to builders with
a bitmap. We don't have them now, though. This would be a useful
contribution to the project.

- Wes

On Fri, Mar 22, 2019 at 10:26 AM Felipe Aramburu  wrote:
>
> Is there a way to use a builder to be able to provide a bit per value LSB
> as is described in the documentation? I have this already and it seems
> silly to convert it to something else so that arrow can then make it the
> same format as what I had to begin with. I know there is the ArrayData
> class that has a make function that seems to allow me to do this but I am
> trying to do things as the documentation suggest which I assumed was the
> preferred method of doing this.
>
>
>
> On Fri, Mar 22, 2019 at 8:13 AM Francois Saint-Jacques <
> fsaintjacq...@gmail.com> wrote:
>
> > Actually, this specific method seems to use a byte per value as you
> > questioned. I think it's worth adding documentation and an explicit warning
> > if it confused me. I'll let bkietz chime in to comment on the usage.
> >
> > François
> >
> > On Fri, Mar 22, 2019 at 10:57 AM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > Hello Felipe,
> > >
> > > it's a bit per value as per memory layout documentation.
> > >
> > > François
> > >
> > >
> > >
> > > On Fri, Mar 22, 2019 at 10:48 AM Felipe Aramburu 
> > > wrote:
> > >
> > >> In the builder base class I see this api
> > >>
> > >>
> > >>
> > https://github.com/apache/arrow/blob/ad1697e5d25eeaff5630421f55b0120f45cf0ce1/cpp/src/arrow/array/builder_base.h#L149
> > >>
> > >> // Vector append. Treat each zero byte as a nullzero. If valid_bytes
> > >> is
> > >> null
> > >> // assume all of length bits are valid.
> > >> void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t
> > length)
> > >>
> > >> Is valid_bytes an allocation of size (int8_t) * length, using an entire
> > >> byte to indicate validity for each element in the array or is this a
> > null
> > >> bitmask where in each byte in valid_bytes encodes 8 values, one per bit?
> > >>
> > >> If this is using a byte per value is there an approved way of using a
> > >> builder to initialize an array using the memory layout described here
> > >> https://arrow.apache.org/docs/memory_layout.html#null-bitmaps?
> > >>
> > >
> >


Re: Creating Arrays from builders using bitmasks

2019-03-22 Thread Felipe Aramburu
Is there a way to use a builder to be able to provide a bit per value LSB
as is described in the documentation? I have this already and it seems
silly to convert it to something else so that arrow can then make it the
same format as what I had to begin with. I know there is the ArrayData
class that has a make function that seems to allow me to do this but I am
trying to do things as the documentation suggest which I assumed was the
preferred method of doing this.



On Fri, Mar 22, 2019 at 8:13 AM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> Actually, this specific method seems to use a byte per value as you
> questioned. I think it's worth adding documentation and an explicit warning
> if it confused me. I'll let bkietz chime in to comment on the usage.
>
> François
>
> On Fri, Mar 22, 2019 at 10:57 AM Francois Saint-Jacques <
> fsaintjacq...@gmail.com> wrote:
>
> > Hello Felipe,
> >
> > it's a bit per value as per memory layout documentation.
> >
> > François
> >
> >
> >
> > On Fri, Mar 22, 2019 at 10:48 AM Felipe Aramburu 
> > wrote:
> >
> >> In the builder base class I see this api
> >>
> >>
> >>
> https://github.com/apache/arrow/blob/ad1697e5d25eeaff5630421f55b0120f45cf0ce1/cpp/src/arrow/array/builder_base.h#L149
> >>
> >> // Vector append. Treat each zero byte as a nullzero. If valid_bytes
> >> is
> >> null
> >> // assume all of length bits are valid.
> >> void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t
> length)
> >>
> >> Is valid_bytes an allocation of size (int8_t) * length, using an entire
> >> byte to indicate validity for each element in the array or is this a
> null
> >> bitmask where in each byte in valid_bytes encodes 8 values, one per bit?
> >>
> >> If this is using a byte per value is there an approved way of using a
> >> builder to initialize an array using the memory layout described here
> >> https://arrow.apache.org/docs/memory_layout.html#null-bitmaps?
> >>
> >
>


Re: Creating Arrays from builders using bitmasks

2019-03-22 Thread Francois Saint-Jacques
Actually, this specific method seems to use a byte per value as you
questioned. I think it's worth adding documentation and an explicit warning
if it confused me. I'll let bkietz chime in to comment on the usage.

François

On Fri, Mar 22, 2019 at 10:57 AM Francois Saint-Jacques <
fsaintjacq...@gmail.com> wrote:

> Hello Felipe,
>
> it's a bit per value as per memory layout documentation.
>
> François
>
>
>
> On Fri, Mar 22, 2019 at 10:48 AM Felipe Aramburu 
> wrote:
>
>> In the builder base class I see this api
>>
>>
>> https://github.com/apache/arrow/blob/ad1697e5d25eeaff5630421f55b0120f45cf0ce1/cpp/src/arrow/array/builder_base.h#L149
>>
>> // Vector append. Treat each zero byte as a nullzero. If valid_bytes
>> is
>> null
>> // assume all of length bits are valid.
>> void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t length)
>>
>> Is valid_bytes an allocation of size (int8_t) * length, using an entire
>> byte to indicate validity for each element in the array or is this a null
>> bitmask where in each byte in valid_bytes encodes 8 values, one per bit?
>>
>> If this is using a byte per value is there an approved way of using a
>> builder to initialize an array using the memory layout described here
>> https://arrow.apache.org/docs/memory_layout.html#null-bitmaps?
>>
>


Re: Creating Arrays from builders using bitmasks

2019-03-22 Thread Francois Saint-Jacques
Hello Felipe,

it's a bit per value as per memory layout documentation.

François



On Fri, Mar 22, 2019 at 10:48 AM Felipe Aramburu 
wrote:

> In the builder base class I see this api
>
>
> https://github.com/apache/arrow/blob/ad1697e5d25eeaff5630421f55b0120f45cf0ce1/cpp/src/arrow/array/builder_base.h#L149
>
> // Vector append. Treat each zero byte as a nullzero. If valid_bytes is
> null
> // assume all of length bits are valid.
> void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t length)
>
> Is valid_bytes an allocation of size (int8_t) * length, using an entire
> byte to indicate validity for each element in the array or is this a null
> bitmask where in each byte in valid_bytes encodes 8 values, one per bit?
>
> If this is using a byte per value is there an approved way of using a
> builder to initialize an array using the memory layout described here
> https://arrow.apache.org/docs/memory_layout.html#null-bitmaps?
>


[jira] [Created] (ARROW-4998) R package fails to install on OSX

2019-03-22 Thread Jordan Ryda (JIRA)
Jordan Ryda created ARROW-4998:
--

 Summary: R package fails to install on OSX
 Key: ARROW-4998
 URL: https://issues.apache.org/jira/browse/ARROW-4998
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
 Environment: OSX Mojave 10.4
R 3.5.2
Rstudio 1.1.463
boost 1.69.0
Reporter: Jordan Ryda


Following a successful homebrew install of apache arrow, on running 
{{devtools::install_github("apache/arrow/r")}} within RStudio the compilation + 
installation all but completes successfully:

 
{code:java}
** testing if installed package can be loaded
Error: package or namespace load failed for 'arrow' in dyn.load(file, DLLpath = 
DLLpath, ...):
 unable to load shared object 
'/Library/Frameworks/R.framework/Versions/3.5/Resources/library/arrow/libs/arrow.so':
  
dlopen(/Library/Frameworks/R.framework/Versions/3.5/Resources/library/arrow/libs/arrow.so,
 6): Symbol not found: 
__ZN5boost11basic_regexIcNS_12regex_traitsIcNS_16cpp_regex_traitsIcE9do_assignEPKcS7_j
  Referenced from: /usr/local/opt/apache-arrow/lib/libparquet.12.dylib
  Expected in: /usr/local/opt/boost/lib/libboost_regex-mt.dylib
 in /usr/local/opt/apache-arrow/lib/libparquet.12.dylib
Error: loading failed
{code}
boost 1.69.0 is already installed and up-to-date.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Creating Arrays from builders using bitmasks

2019-03-22 Thread Felipe Aramburu
In the builder base class I see this api

https://github.com/apache/arrow/blob/ad1697e5d25eeaff5630421f55b0120f45cf0ce1/cpp/src/arrow/array/builder_base.h#L149

// Vector append. Treat each zero byte as a nullzero. If valid_bytes is
null
// assume all of length bits are valid.
void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t length)

Is valid_bytes an allocation of size (int8_t) * length, using an entire
byte to indicate validity for each element in the array or is this a null
bitmask where in each byte in valid_bytes encodes 8 values, one per bit?

If this is using a byte per value is there an approved way of using a
builder to initialize an array using the memory layout described here
https://arrow.apache.org/docs/memory_layout.html#null-bitmaps?


[jira] [Created] (ARROW-4997) [C#] ArrowStreamReader doesn't consume whole stream and doesn't implement sync read

2019-03-22 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-4997:
---

 Summary: [C#] ArrowStreamReader doesn't consume whole stream and 
doesn't implement sync read
 Key: ARROW-4997
 URL: https://issues.apache.org/jira/browse/ARROW-4997
 Project: Apache Arrow
  Issue Type: Bug
  Components: C#
Reporter: Eric Erhardt
Assignee: Eric Erhardt


There are 2 major issues with the ArrowStreamReader that are blocking me from 
using it.
 # When it reads a batch from a .NET Stream that doesn't return the whole chunk 
of memory in one "Read" call (like a socket/network stream), it only calls Read 
once, and then continues on. This is an issue because it has "garbage" at the 
end of its buffer (which was never written to by the stream), and when 
attempting to read the next batch, it is in the middle of the previous batch 
from the .NET Stream. This causes all sorts of issues because it assumes the 
next 4 bytes are the message length, which it obviously isn't. See [the reading 
code|https://github.com/apache/arrow/blob/13fd813445b4738cbebbd137490fe3c02071c04b/csharp/src/Apache.Arrow/Ipc/ArrowStreamReaderImplementation.cs#L90-L97]
 for where it only calls Read once - it should be in a loop.
 # ArrowStreamReader has a synchronous ReadNextRecordBatch() method - but it 
throws NotImplementedException. This is necessary when a caller isn't in an 
async method, they can't/shouldn't call the async API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)