Re: Debian dependencies on pyarrow for a C++ and Python package

2021-03-31 Thread Sutou Kouhei
Hi,

We need to build python3-pyarrow package for it. We can use
Pybuild for it: https://wiki.debian.org/Python/Pybuild

Our debian/: 
https://github.com/apache/arrow/tree/master/dev/tasks/linux-packages/apache-arrow/debian


BTW, we can use pip with our deb package:

  apt install -y libarrow-python-dev python3-pip git cmake pkg-config
  pip3 install --no-binary pyarrow pyarrow


Thanks,
--
kou

In 
  "Debian dependencies on pyarrow for a C++ and Python package" on Wed, 31 Mar 
2021 17:12:14 -0500,
  Arthur Peters  wrote:

> I am working on packaging a system that uses Arrow from both C++ and
> Python. I have working Conda packages without much trouble. However,
> Ubuntu/APT packages are turning out to be a problem. The deb packages
> (at
> https://apache.bintray.com/arrow/ubuntu/pool/focal/main/a/apache-arrow/
> for instance) don't appear to include the pyarrow wrappers. There are
> pip packages for pyarrow, but they include their own incompatible
> binaries by default and a deb cannot depend on a pip package anyway.
> 
> Is there a way to create a deb package that depends on pyarrow? And if
> not, is there anything we can do about it? Also, are there plans to
> create non-conda binaries that support both C++ and Python at the same
> time? Or did I miss something and it all just works if I'm less dumb?
> :-)
> 
> Thanks so much!
> 
> -Arthur
> 
> -- 
> 
> 
>   
> 
> *Arthur Michener Peters*, Software Engineer
> 
> He/him/his
> 
>   
> 
> *KATANA GRAPH*
> 
> 400 West 15th Street, Suite 150, Austin, TX 78701
> 
> a...@katanagraph.com 
> 
> _katanagraph.com_ 
> 


Debian dependencies on pyarrow for a C++ and Python package

2021-03-31 Thread Arthur Peters
I am working on packaging a system that uses Arrow from both C++ and 
Python. I have working Conda packages without much trouble. However, 
Ubuntu/APT packages are turning out to be a problem. The deb packages 
(at 
https://apache.bintray.com/arrow/ubuntu/pool/focal/main/a/apache-arrow/ 
for instance) don't appear to include the pyarrow wrappers. There are 
pip packages for pyarrow, but they include their own incompatible 
binaries by default and a deb cannot depend on a pip package anyway.


Is there a way to create a deb package that depends on pyarrow? And if 
not, is there anything we can do about it? Also, are there plans to 
create non-conda binaries that support both C++ and Python at the same 
time? Or did I miss something and it all just works if I'm less dumb? :-)


Thanks so much!

-Arthur

--




*Arthur Michener Peters*, Software Engineer

He/him/his



*KATANA GRAPH*

400 West 15th Street, Suite 150, Austin, TX 78701

a...@katanagraph.com 

_katanagraph.com_ 



Re: sparse data array

2021-03-31 Thread bobtins
I appreciate the feedback. I realize it's a tricky nut to crack; there's always 
going to be a desire to use compression to improve scaling, and I was trying to 
identify a connecting thread between various requests for compression 
enhancements on this list and my own experience. I'll look at the spec again 
and put it on my back burner.

On 2021/03/31 04:03:07, Micah Kornfield  wrote: 
> Hi Bob,
> 
> 
> > I can observe that in a project like Arrow, there is always a tension
> > between compatibility and extensibility, and it makes me wonder if it would
> > be helpful to add capabilities without changing the spec. An extension type
> > can be defined in terms of one of the built-in layouts, but it would define
> > semantics (such as compression) that would be used to interpret that layout.
> 
> 
> I'm not sure if this is referring to existing extension types [1] but I
> believe  they are insufficient for this purpose.  The compression
> techniques being discussed don't work well, because compression violates
> the fundamental assumptions of the existing protocols.  Each array is
> expected to have an equal number of slots.  So an array compressed as a
> struct, would cause misalignment with non-encoded arrays.
> 
> For example, integers are stored in blocks of 4096 values, with each block
> > the minimum size to hold all the values. You access the value n with the
> > expression "block[n >> 12][n % 4096]".
> > Take an example with 1M int32 values. Value 0 is 1e9 but all the others
> > are 0 to 9.
> > Normally you would use 4M bytes to store these, but you could instead have
> > 1 block of int32 (16k) and 255 blocks of int8 (1020k) plus 1K storage for
> > block offsets, so a savings of almost 75%. If you could have uint4 blocks
> > you could save about 87%.
> 
> 
> This is difficult with the existing RecordBatch stream approach  since
> schema is fixed ahead of time.  One could theoretically standardize a
> notion of schema replacement in communications.  The I linked takes a
> different approach and allows for adjusting encodings on a per message
> basis.  Both are potentially viable.
> 
> [1] https://arrow.apache.org/docs/format/Columnar.html#extension-types
> 
> On Tue, Mar 30, 2021 at 5:09 PM bobtins  wrote:
> 
> > From your response, I'm inferring that in order to introduce this kind of
> > compression, support in the spec is needed, similar to how compression
> > types and parameters are enumerated in
> > https://github.com/apache/arrow/blob/master/format/Message.fbs. Any
> > change in the spec is a Big Deal (and it should be).
> >
> > I can observe that in a project like Arrow, there is always a tension
> > between compatibility and extensibility, and it makes me wonder if it would
> > be helpful to add capabilities without changing the spec. An extension type
> > can be defined in terms of one of the built-in layouts, but it would define
> > semantics (such as compression) that would be used to interpret that
> > layout.
> >
> > > > > On Thu, Mar 25, 2021 at 2:17 AM Jorge Cardoso Leitão <
> > > > > jorgecarlei...@gmail.com> wrote:
> > > > >
> > > > > > Would it be an option to use a StructArray for that? One array
> > with the
> > > > > > values, and one with the repetitions:
> > > > > >
> > > > > > Int32([1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1, 2]) ->
> > > > > >
> > > > > > StructArray([
> > > > > > "values": Int32([1, 2, 3, 1, 2]),
> > > > > > "repetitions": UInt32([1, 3, 5, 1, 1]),
> > > > > > ])
> > > > > >
> > > > > > It does not have the same API, but I think that the physical
> > operations
> > > > > > would be different, anyways: ("array + 2" would only operate on
> > > > > "values").
> > > > > > I think that a small struct / object with some operator overloading
> > > > would
> > > > > > address this, and writing something on the metadata would allow
> > others
> > > > to
> > > > > > consume it, a-la extension type?
> > > > > >
> > > > > > On a related note, such encoding would address DataFusion's issue
> > of
> > > > > > representing scalars / constant arrays: a constant array would be
> > > > > > represented as a repetition. Currently we just unpack (i.e.
> > allocate) a
> > > > > > constant array when we want to transfer through a RecordBatch.
> > > > > >
> >
> > I just reread the whole thread, and realized Jorge was saying a similar
> > thing, that this new type could be built on an existing layout. But I guess
> > I'm also imagining some more general capabilities:
> >
> > * Define an extension type in terms of an existing layout
> > * Register an extension type implementation
> > * Enumerate available extension types
> >
> > It would make life more complicated, for sure, but it would allow things
> > like compression to evolve more quickly. I'm thinking about the in-memory
> > implementation that I built, where I did various things to save memory.
> >
> > For example, integers are stored in blocks of 4096 values, with each block
> > the minimum size to hold all the values. You acce

Re: Arrow sync call March 31 at 12:00 US/Eastern, 16:00 UTC

2021-03-31 Thread Jonathan Keane
Thank you everyone who attended, here are the notes.

Attendees:

Jonathan Keane

Colin Alworth

David Sanders

Micah Kornfield

Rok Mihevc

Projjal Chanda

Eduardo Ponce

Krill Lykov


Discussion:

   - 4.0 release
  - zstd compression for the java library (has PR that is approved but
  needs merged still)
  - One issue with parquet that might be good to get resolved before
  4.0 (getting the Jira
  https://issues.apache.org/jira/browse/ARROW-11629)
   - Formalize change for minor PRs
  - Will be merged Friday if there’s no objections
   - Regex kernel - is someone working on this (yes:
   https://issues.apache.org/jira/browse/ARROW-12134)
   - Discussion of jira search and how to locate where work is planned


On Wed, Mar 31, 2021 at 11:11 AM Antoine Pitrou  wrote:

>
> I'm fine with Zoom.  But doesn't need it a host as well?
>
>
> Le 31/03/2021 à 18:09, Wes McKinney a écrit :
> > The Google Meet link is on dremio.com, so there must not be someone
> > from the org to let people in. What do folks think about moving to
> > Zoom for future meetings (which shouldn't have this problem)?
> >
> > On Wed, Mar 31, 2021 at 11:07 AM Jonathan Keane 
> wrote:
> >>
> >> I'm experiencing the same here.
> >>
> >> On Wed, Mar 31, 2021 at 11:06 AM Kirill Lykov 
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I don't know about the others but I cannot join because someone needs
> to
> >>> let me in.
> >>> Might be it the problem also for other people?
> >>>
> >>> On Tue, Mar 30, 2021 at 5:53 PM Neal Richardson <
> >>> neal.p.richard...@gmail.com>
> >>> wrote:
> >>>
>  Hi all,
>  Our biweekly call is coming up tomorrow at
>  https://meet.google.com/vtm-teks-phx. All are welcome to join. I
> won't
> >>> be
>  able to attend this week, but hopefully someone else will share notes
> >>> with
>  the mailing list afterward.
> 
>  Neal
> 
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Kirill Lykov
> >>>
>


Re: Arrow sync call March 31 at 12:00 US/Eastern, 16:00 UTC

2021-03-31 Thread Wes McKinney
It does, but I would suggest that someone volunteer to host the call
each week and send out a Zoom link for that week's call

On Wed, Mar 31, 2021 at 11:11 AM Antoine Pitrou  wrote:
>
>
> I'm fine with Zoom.  But doesn't need it a host as well?
>
>
> Le 31/03/2021 à 18:09, Wes McKinney a écrit :
> > The Google Meet link is on dremio.com, so there must not be someone
> > from the org to let people in. What do folks think about moving to
> > Zoom for future meetings (which shouldn't have this problem)?
> >
> > On Wed, Mar 31, 2021 at 11:07 AM Jonathan Keane  
> > wrote:
> >>
> >> I'm experiencing the same here.
> >>
> >> On Wed, Mar 31, 2021 at 11:06 AM Kirill Lykov 
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I don't know about the others but I cannot join because someone needs to
> >>> let me in.
> >>> Might be it the problem also for other people?
> >>>
> >>> On Tue, Mar 30, 2021 at 5:53 PM Neal Richardson <
> >>> neal.p.richard...@gmail.com>
> >>> wrote:
> >>>
>  Hi all,
>  Our biweekly call is coming up tomorrow at
>  https://meet.google.com/vtm-teks-phx. All are welcome to join. I won't
> >>> be
>  able to attend this week, but hopefully someone else will share notes
> >>> with
>  the mailing list afterward.
> 
>  Neal
> 
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Kirill Lykov
> >>>


Re: Arrow sync call March 31 at 12:00 US/Eastern, 16:00 UTC

2021-03-31 Thread Antoine Pitrou



I'm fine with Zoom.  But doesn't need it a host as well?


Le 31/03/2021 à 18:09, Wes McKinney a écrit :

The Google Meet link is on dremio.com, so there must not be someone
from the org to let people in. What do folks think about moving to
Zoom for future meetings (which shouldn't have this problem)?

On Wed, Mar 31, 2021 at 11:07 AM Jonathan Keane  wrote:


I'm experiencing the same here.

On Wed, Mar 31, 2021 at 11:06 AM Kirill Lykov 
wrote:


Hi,

I don't know about the others but I cannot join because someone needs to
let me in.
Might be it the problem also for other people?

On Tue, Mar 30, 2021 at 5:53 PM Neal Richardson <
neal.p.richard...@gmail.com>
wrote:


Hi all,
Our biweekly call is coming up tomorrow at
https://meet.google.com/vtm-teks-phx. All are welcome to join. I won't

be

able to attend this week, but hopefully someone else will share notes

with

the mailing list afterward.

Neal




--
Best regards,
Kirill Lykov



Re: Arrow sync call March 31 at 12:00 US/Eastern, 16:00 UTC

2021-03-31 Thread Wes McKinney
The Google Meet link is on dremio.com, so there must not be someone
from the org to let people in. What do folks think about moving to
Zoom for future meetings (which shouldn't have this problem)?

On Wed, Mar 31, 2021 at 11:07 AM Jonathan Keane  wrote:
>
> I'm experiencing the same here.
>
> On Wed, Mar 31, 2021 at 11:06 AM Kirill Lykov 
> wrote:
>
> > Hi,
> >
> > I don't know about the others but I cannot join because someone needs to
> > let me in.
> > Might be it the problem also for other people?
> >
> > On Tue, Mar 30, 2021 at 5:53 PM Neal Richardson <
> > neal.p.richard...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > > Our biweekly call is coming up tomorrow at
> > > https://meet.google.com/vtm-teks-phx. All are welcome to join. I won't
> > be
> > > able to attend this week, but hopefully someone else will share notes
> > with
> > > the mailing list afterward.
> > >
> > > Neal
> > >
> >
> >
> > --
> > Best regards,
> > Kirill Lykov
> >


Re: Arrow sync call March 31 at 12:00 US/Eastern, 16:00 UTC

2021-03-31 Thread Jonathan Keane
I'm experiencing the same here.

On Wed, Mar 31, 2021 at 11:06 AM Kirill Lykov 
wrote:

> Hi,
>
> I don't know about the others but I cannot join because someone needs to
> let me in.
> Might be it the problem also for other people?
>
> On Tue, Mar 30, 2021 at 5:53 PM Neal Richardson <
> neal.p.richard...@gmail.com>
> wrote:
>
> > Hi all,
> > Our biweekly call is coming up tomorrow at
> > https://meet.google.com/vtm-teks-phx. All are welcome to join. I won't
> be
> > able to attend this week, but hopefully someone else will share notes
> with
> > the mailing list afterward.
> >
> > Neal
> >
>
>
> --
> Best regards,
> Kirill Lykov
>


Re: Arrow sync call March 31 at 12:00 US/Eastern, 16:00 UTC

2021-03-31 Thread Kirill Lykov
Hi,

I don't know about the others but I cannot join because someone needs to
let me in.
Might be it the problem also for other people?

On Tue, Mar 30, 2021 at 5:53 PM Neal Richardson 
wrote:

> Hi all,
> Our biweekly call is coming up tomorrow at
> https://meet.google.com/vtm-teks-phx. All are welcome to join. I won't be
> able to attend this week, but hopefully someone else will share notes with
> the mailing list afterward.
>
> Neal
>


-- 
Best regards,
Kirill Lykov


Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-31 Thread Antoine Pitrou



Le 31/03/2021 à 17:55, Micah Kornfield a écrit :

Thanks for the feedback.  A couple of points here and some responses below.

* One other question is whether the Nanoseconds should actually be
configurable (i.e. use milliseconds or microseconds).  I would lean towards
no.


Same for me.


* I'm also still not 100% convinced we need this as a first class type in
arrow or if we should be looking more closely at the Struct (in the Arrow
sense) based implementation.  In the future where alternative encodings are
supported, this could allow for much smaller footprints for this type.


Having a "packed" first class type allows for better locality when 
accessing data.  It doesn't sound very likely that you'd access only one 
component of the interval.


But I have no idea how important this is, and temporal datetypes are 
generally cumbersome to add support for (conversions, arithmetic, etc.), 
so it would be nice to avoid adding too many of them :-)


Regards

Antoine.





The 3

field implementation doesn't seem to have any way to represent integral
days, so I am also not sure about that one.



Sorry this was an email gaffe.  I intended Month (32 bit int), Day (32 bit
int), Nanosecond (64 bit int).

OTOH I don't really understand the point of supporting "the most

reasonable ranges for Year, Month and Nanoseconds independently".  What
does it bring to encode more than one month in the nanoseconds field?



I'm happy with simplicity.   In the past there has been some reference to
people wanting to store very large timestamps (fall out of Nanoseconds max
representable value) but we've concluded that this wasn't something that we
wanted to really support.






On Wed, Mar 31, 2021 at 4:49 AM Antoine Pitrou  wrote:



I would favour the following characteristics :
- support for nanoseconds (especially as other Arrow temporal types
support it)
- easy to handle (which excludes the ZetaSQL representtaion IMHO)

OTOH I don't really understand the point of supporting "the most
reasonable ranges for Year, Month and Nanoseconds independently".  What
does it bring to encode more than one month in the nanoseconds field?
You can already use the Duration type for that.

Regards

Antoine.


Le 31/03/2021 à 05:48, Micah Kornfield a écrit :

To follow-up on this conversation I did some analysis on interval types:



https://docs.google.com/document/d/1i1E_fdQ_xODZcAhsV11Pfq27O50k679OYHXFJpm9NS0/edit
Please feel free to add more details/systems I missed.


Given the disparate requirements of different systems I think the

following might make sense for official types (if there isn't consensus, I
might try to contributation extension Array implementations for them to
Java and C++/Python separately).


1.  3 fields: Year (32 bit), Month (32 bit), Nanoseconds (64 bit) all

signed.

2.  Postgres representation (Downside is it doesn't support Nanoseconds,

only microseconds).

3.  ZetaSQL implementation (Requires some bit manipulation) but supports

the most reasonable ranges for Year, Month and Nanoseconds independently.


Thoughts?

Micah

On 2021/02/18 04:30:55 Micah Kornfield wrote:


I didn’t find any page/documentation on how to do RFC in Arrow

protocol,

so can anyone point me to it or PR with email will be enough?


That is enough to start discussion.  Before formal acceptance and

merging

of the PR there needs to be a Java and C++ implementations for the type
that pass integration tests.  At the time this guideline was instituted
Java and C++ were considered the "reference" implementations (I think

they

still have the most complete integration test coverage).

My understanding is that the current modelling of intervals mimics SQL
standards (e.g. SQL Server [1]).  So it would also be good to step back

and

understand what problem DF is trying to solve and how it differs from

other

SQL implementations.  I'd be hesitant to accept COMPLEX as a new type
without a much deeper analysis into calendar representations within

Arrow

and how they relate to other existing systems (e.g. Hive and some
assortment of existing SQL databases).  For instance the current

modelling

of timestamps does not lend itself to constructing a COMPLEX interval

type

particularly well. (Duration was introduced for this reason).

I think both Wes's suggestion of FixedSizeBinary and Andrew's of

composing

the with a struct are good stop-gaps.  These obviously have different
trade-offs.  Ultimately, it would be good to define common extension

types

that can represent this use-case if there really is demand for it (if it
doesn't become a top level type).

[1]


https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types?view=sql-server-ver15


-Micah

On Wed, Feb 17, 2021 at 2:05 PM Andrew Lamb 

wrote:



That is a great suggestion Wes, thank you.

I wonder if we could get away with a 128 bit representation that is the
concatenation of the two existing interval types (YearMonth)(DayTime).

Or

maybe even define a `st

Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-31 Thread Micah Kornfield
Thanks for the feedback.  A couple of points here and some responses below.

* One other question is whether the Nanoseconds should actually be
configurable (i.e. use milliseconds or microseconds).  I would lean towards
no.

* I'm also still not 100% convinced we need this as a first class type in
arrow or if we should be looking more closely at the Struct (in the Arrow
sense) based implementation.  In the future where alternative encodings are
supported, this could allow for much smaller footprints for this type.

The 3
> field implementation doesn't seem to have any way to represent integral
> days, so I am also not sure about that one.


Sorry this was an email gaffe.  I intended Month (32 bit int), Day (32 bit
int), Nanosecond (64 bit int).

OTOH I don't really understand the point of supporting "the most
> reasonable ranges for Year, Month and Nanoseconds independently".  What
> does it bring to encode more than one month in the nanoseconds field?


I'm happy with simplicity.   In the past there has been some reference to
people wanting to store very large timestamps (fall out of Nanoseconds max
representable value) but we've concluded that this wasn't something that we
wanted to really support.






On Wed, Mar 31, 2021 at 4:49 AM Antoine Pitrou  wrote:

>
> I would favour the following characteristics :
> - support for nanoseconds (especially as other Arrow temporal types
> support it)
> - easy to handle (which excludes the ZetaSQL representtaion IMHO)
>
> OTOH I don't really understand the point of supporting "the most
> reasonable ranges for Year, Month and Nanoseconds independently".  What
> does it bring to encode more than one month in the nanoseconds field?
> You can already use the Duration type for that.
>
> Regards
>
> Antoine.
>
>
> Le 31/03/2021 à 05:48, Micah Kornfield a écrit :
> > To follow-up on this conversation I did some analysis on interval types:
> >
> >
> https://docs.google.com/document/d/1i1E_fdQ_xODZcAhsV11Pfq27O50k679OYHXFJpm9NS0/edit
> Please feel free to add more details/systems I missed.
> >
> > Given the disparate requirements of different systems I think the
> following might make sense for official types (if there isn't consensus, I
> might try to contributation extension Array implementations for them to
> Java and C++/Python separately).
> >
> > 1.  3 fields: Year (32 bit), Month (32 bit), Nanoseconds (64 bit) all
> signed.
> > 2.  Postgres representation (Downside is it doesn't support Nanoseconds,
> only microseconds).
> > 3.  ZetaSQL implementation (Requires some bit manipulation) but supports
> the most reasonable ranges for Year, Month and Nanoseconds independently.
> >
> > Thoughts?
> >
> > Micah
> >
> > On 2021/02/18 04:30:55 Micah Kornfield wrote:
> >>>
> >>> I didn’t find any page/documentation on how to do RFC in Arrow
> protocol,
> >>> so can anyone point me to it or PR with email will be enough?
> >>
> >> That is enough to start discussion.  Before formal acceptance and
> merging
> >> of the PR there needs to be a Java and C++ implementations for the type
> >> that pass integration tests.  At the time this guideline was instituted
> >> Java and C++ were considered the "reference" implementations (I think
> they
> >> still have the most complete integration test coverage).
> >>
> >> My understanding is that the current modelling of intervals mimics SQL
> >> standards (e.g. SQL Server [1]).  So it would also be good to step back
> and
> >> understand what problem DF is trying to solve and how it differs from
> other
> >> SQL implementations.  I'd be hesitant to accept COMPLEX as a new type
> >> without a much deeper analysis into calendar representations within
> Arrow
> >> and how they relate to other existing systems (e.g. Hive and some
> >> assortment of existing SQL databases).  For instance the current
> modelling
> >> of timestamps does not lend itself to constructing a COMPLEX interval
> type
> >> particularly well. (Duration was introduced for this reason).
> >>
> >> I think both Wes's suggestion of FixedSizeBinary and Andrew's of
> composing
> >> the with a struct are good stop-gaps.  These obviously have different
> >> trade-offs.  Ultimately, it would be good to define common extension
> types
> >> that can represent this use-case if there really is demand for it (if it
> >> doesn't become a top level type).
> >>
> >> [1]
> >>
> https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types?view=sql-server-ver15
> >>
> >> -Micah
> >>
> >> On Wed, Feb 17, 2021 at 2:05 PM Andrew Lamb 
> wrote:
> >>
> >>> That is a great suggestion Wes, thank you.
> >>>
> >>> I wonder if we could get away with a 128 bit representation that is the
> >>> concatenation of the two existing interval types (YearMonth)(DayTime).
> Or
> >>> maybe even define a `struct` type with those fields that is used by
> >>> DataFusion.
> >>>
> >>> Basically, given our reading of the Arrow spec[1], it is currently not
> >>> possible to precisely represent an interv

Re: [RESULT] [VOTE] Accept donation of Rust Ballista project

2021-03-31 Thread Wes McKinney
hi Andy — before you start an IP clearance vote, you need to add an
entry on https://incubator.apache.org/ip-clearance/ and run through
the clearance checklist, let me know if you have trouble and I can
help you.

Thanks

On Wed, Mar 31, 2021 at 8:47 AM Andy Grove  wrote:
>
> CLAs have been submitted or are already on file for just over half the
> contributors at this point (some are still waiting for confirmation from
> secretary@) and based on feedback from the incubator folks, it looks like
> we can move ahead with a vote since the project has been Apache-licensed
> since its inception and nobody is objecting to the donation. If anyone does
> object in the future then we can remove their code. I am going to wait a
> couple more days so that the pending ICLA submissions get processed by the
> ASF and then will start the vote on the IP clearance.
>
> Thanks for everyone's patience with this.
>
> On Thu, Mar 25, 2021 at 8:03 AM Andy Grove  wrote:
>
> > Re-sending with result subject line.
> >
> > On Thu, Mar 25, 2021 at 7:28 AM Andy Grove  wrote:
> >
> >> Thank you all for voting.
> >>
> >> The vote passes with 8 binding votes from PMC members and 8 non-binding
> >> votes.
> >>
> >> I will begin the process of contacting contributors and asking them to
> >> summit CLAs and I will also reach out to the Apache Incubator team about
> >> the process in case we cannot obtain CLAs from all contributors.
> >>
> >> Thanks,
> >>
> >> Andy.
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Mar 23, 2021 at 10:29 AM Rok Mihevc  wrote:
> >>
> >>> +1 (non-binding)
> >>>
> >>> On Tue, Mar 23, 2021 at 4:40 PM Eric Burden 
> >>> wrote:
> >>>
> >>> > +1
> >>> >
> >>> > On Tue, Mar 23, 2021 at 7:18 AM Francois Saint-Jacques <
> >>> > fsaintjacq...@gmail.com> wrote:
> >>> >
> >>> > > +1
> >>> > >
> >>> > > On Mon, Mar 22, 2021 at 8:33 AM Andrew Lamb 
> >>> > wrote:
> >>> > > >
> >>> > > > +1
> >>> > > >
> >>> > > > On Sun, Mar 21, 2021 at 7:08 PM paddy horan <
> >>> paddyho...@hotmail.com>
> >>> > > wrote:
> >>> > > >
> >>> > > > > +1 (non-binding)
> >>> > > > >
> >>> > > > >
> >>> > > > > 
> >>> > > > > From: Sutou Kouhei 
> >>> > > > > Sent: Sunday, March 21, 2021 4:34:43 PM
> >>> > > > > To: dev@arrow.apache.org 
> >>> > > > > Subject: Re: [VOTE] Accept donation of Rust Ballista project
> >>> > > > >
> >>> > > > > +1 (binding)
> >>> > > > >
> >>> > > > > In  >>> > g9unbudmdyg7wlornhehz99sgtkmo...@mail.gmail.com
> >>> > > >
> >>> > > > >   "[VOTE] Accept donation of Rust Ballista project" on Sun, 21
> >>> Mar
> >>> > 2021
> >>> > > > > 09:56:32 -0600,
> >>> > > > >   Andy Grove  wrote:
> >>> > > > >
> >>> > > > > > Dear all,
> >>> > > > > >
> >>> > > > > > On behalf of the Ballista community, I would like to propose
> >>> that
> >>> > we
> >>> > > > > donate
> >>> > > > > > Ballista to the Apache Arrow project.
> >>> > > > > >
> >>> > > > > > Ballista is a distributed scheduler based on Arrow standards
> >>> > (memory
> >>> > > > > > format, IPC, Flight) and supports distributed query execution
> >>> with
> >>> > > the
> >>> > > > > > DataFusion query engine.
> >>> > > > > >
> >>> > > > > > The community has had an opportunity to discuss this [1] and
> >>> there
> >>> > > do not
> >>> > > > > > seem to be objections to this.
> >>> > > > > >
> >>> > > > > > The code donation in the form of a pull request:
> >>> > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > >
> >>> >
> >>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F9723&data=04%7C01%7C%7C4a5c92ba10ac41a6679c08d8eca8ceaa%7C84df9e7fe9f640afb435%7C1%7C0%7C637519557060004893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=MjSjUnA1L%2BV3QRRiK%2FjwoBFMAYZ61cpmwCbZ5WqyBm8%3D&reserved=0
> >>> > > > > >
> >>> > > > > > This vote is to determine if the Arrow PMC is in favor of
> >>> accepting
> >>> > > this
> >>> > > > > > donation. If the vote passes, the PMC and the authors of the
> >>> code
> >>> > > will
> >>> > > > > work
> >>> > > > > > together to complete the ASF IP Clearance process (
> >>> > > > > >
> >>> > > > >
> >>> > >
> >>> >
> >>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fincubator.apache.org%2Fip-clearance%2F&data=04%7C01%7C%7C4a5c92ba10ac41a6679c08d8eca8ceaa%7C84df9e7fe9f640afb435%7C1%7C0%7C637519557060004893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=y6AsPjwyysggZI%2BmDeBkU%2Fu%2B8RGYbRY5PYv9D2uoKac%3D&reserved=0
> >>> > > )
> >>> > > > > and import this Rust codebase
> >>> > > > > > implementation into Apache Arrow.
> >>> > > > > >
> >>> > > > > > [ ] +1 : Accept contribution of Ballista [ ] 0 : No opinion [
> >>> ] -1
> >>> > :
> >>> > > > > Reject
> >>> > > > > > contribution because...
> >>> > > > > >
> >>> > > > > > Here is my vote: +1
> >>> > > > > >
> >>> > > > > > The vote will be open for at least 72 hours.
> >>> > > >

Re: [RESULT] [VOTE] Accept donation of Rust Ballista project

2021-03-31 Thread Andy Grove
CLAs have been submitted or are already on file for just over half the
contributors at this point (some are still waiting for confirmation from
secretary@) and based on feedback from the incubator folks, it looks like
we can move ahead with a vote since the project has been Apache-licensed
since its inception and nobody is objecting to the donation. If anyone does
object in the future then we can remove their code. I am going to wait a
couple more days so that the pending ICLA submissions get processed by the
ASF and then will start the vote on the IP clearance.

Thanks for everyone's patience with this.

On Thu, Mar 25, 2021 at 8:03 AM Andy Grove  wrote:

> Re-sending with result subject line.
>
> On Thu, Mar 25, 2021 at 7:28 AM Andy Grove  wrote:
>
>> Thank you all for voting.
>>
>> The vote passes with 8 binding votes from PMC members and 8 non-binding
>> votes.
>>
>> I will begin the process of contacting contributors and asking them to
>> summit CLAs and I will also reach out to the Apache Incubator team about
>> the process in case we cannot obtain CLAs from all contributors.
>>
>> Thanks,
>>
>> Andy.
>>
>>
>>
>>
>>
>> On Tue, Mar 23, 2021 at 10:29 AM Rok Mihevc  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Tue, Mar 23, 2021 at 4:40 PM Eric Burden 
>>> wrote:
>>>
>>> > +1
>>> >
>>> > On Tue, Mar 23, 2021 at 7:18 AM Francois Saint-Jacques <
>>> > fsaintjacq...@gmail.com> wrote:
>>> >
>>> > > +1
>>> > >
>>> > > On Mon, Mar 22, 2021 at 8:33 AM Andrew Lamb 
>>> > wrote:
>>> > > >
>>> > > > +1
>>> > > >
>>> > > > On Sun, Mar 21, 2021 at 7:08 PM paddy horan <
>>> paddyho...@hotmail.com>
>>> > > wrote:
>>> > > >
>>> > > > > +1 (non-binding)
>>> > > > >
>>> > > > >
>>> > > > > 
>>> > > > > From: Sutou Kouhei 
>>> > > > > Sent: Sunday, March 21, 2021 4:34:43 PM
>>> > > > > To: dev@arrow.apache.org 
>>> > > > > Subject: Re: [VOTE] Accept donation of Rust Ballista project
>>> > > > >
>>> > > > > +1 (binding)
>>> > > > >
>>> > > > > In >> > g9unbudmdyg7wlornhehz99sgtkmo...@mail.gmail.com
>>> > > >
>>> > > > >   "[VOTE] Accept donation of Rust Ballista project" on Sun, 21
>>> Mar
>>> > 2021
>>> > > > > 09:56:32 -0600,
>>> > > > >   Andy Grove  wrote:
>>> > > > >
>>> > > > > > Dear all,
>>> > > > > >
>>> > > > > > On behalf of the Ballista community, I would like to propose
>>> that
>>> > we
>>> > > > > donate
>>> > > > > > Ballista to the Apache Arrow project.
>>> > > > > >
>>> > > > > > Ballista is a distributed scheduler based on Arrow standards
>>> > (memory
>>> > > > > > format, IPC, Flight) and supports distributed query execution
>>> with
>>> > > the
>>> > > > > > DataFusion query engine.
>>> > > > > >
>>> > > > > > The community has had an opportunity to discuss this [1] and
>>> there
>>> > > do not
>>> > > > > > seem to be objections to this.
>>> > > > > >
>>> > > > > > The code donation in the form of a pull request:
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > >
>>> >
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F9723&data=04%7C01%7C%7C4a5c92ba10ac41a6679c08d8eca8ceaa%7C84df9e7fe9f640afb435%7C1%7C0%7C637519557060004893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=MjSjUnA1L%2BV3QRRiK%2FjwoBFMAYZ61cpmwCbZ5WqyBm8%3D&reserved=0
>>> > > > > >
>>> > > > > > This vote is to determine if the Arrow PMC is in favor of
>>> accepting
>>> > > this
>>> > > > > > donation. If the vote passes, the PMC and the authors of the
>>> code
>>> > > will
>>> > > > > work
>>> > > > > > together to complete the ASF IP Clearance process (
>>> > > > > >
>>> > > > >
>>> > >
>>> >
>>> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fincubator.apache.org%2Fip-clearance%2F&data=04%7C01%7C%7C4a5c92ba10ac41a6679c08d8eca8ceaa%7C84df9e7fe9f640afb435%7C1%7C0%7C637519557060004893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=y6AsPjwyysggZI%2BmDeBkU%2Fu%2B8RGYbRY5PYv9D2uoKac%3D&reserved=0
>>> > > )
>>> > > > > and import this Rust codebase
>>> > > > > > implementation into Apache Arrow.
>>> > > > > >
>>> > > > > > [ ] +1 : Accept contribution of Ballista [ ] 0 : No opinion [
>>> ] -1
>>> > :
>>> > > > > Reject
>>> > > > > > contribution because...
>>> > > > > >
>>> > > > > > Here is my vote: +1
>>> > > > > >
>>> > > > > > The vote will be open for at least 72 hours.
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > > Andy.
>>> > > > > >
>>> > > > > > [1]
>>> > > > > >
>>> > > > >
>>> > >
>>> >
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fx%2Fthread.html%2Fr09556898c9c94259c00e35c04ea051040931bbe9ce577cba60c148c8%40%253Cdev.arrow.apache.org%253E&data=04%7C01%7C%7C4a5c92ba10ac41a6679c08d8eca8ceaa%7C84df9e7fe9f640afb435%7C1%7C0%7C637519557060004893%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%

Re: Call for Presentations for ApacheCon 2021 now open

2021-03-31 Thread Zoran Regvart
Hi all,
just dripping in with a gentle reminder of the CFP and notifying you
that this year's ApacheCon Asia also features the Integration track,
so you might want to consider submitting your presentation there as
well.

I hope that we can have more software integration projects at ASF
presented/promoted besides Camel on the Integration track.

You can find links to the CFPs on Camel's website here:

https://camel.apache.org/blog/2021/03/ApacheCons2021/

zoran

On Tue, Mar 9, 2021 at 11:08 AM Zoran Regvart  wrote:
>
> Hi folk,
> On the backs of the resounding success from last year's,
> ApacheCon@Home will be online, and again, we'll be featuring the
> (Software) Integration track. I'm the chair of the Integration track
> at ApacheCon@Home. And I'd like to invite folk from the Apache Arrow
> community to participate by submitting a presentation for ApacheCon.
>
> Software integration is a broad topic and one I think Apache Arrow
> folk can contribute considerably to the discussion. For the
> Integration track, we wish to focus on showcasing the projects at ASF
> that deal with software integration, and along with that, we also have
> a preference for use case presentations.
>
> You can submit now till 1st of May at:
>
> https://acah2021.jamhosted.net/
>
> Feel free to reach out to me if you have any questions,
>
> zoran
>
> -- Forwarded message -
> From: Rich Bowen 
> Date: Mon, Mar 8, 2021 at 2:52 PM
> Subject: Call for Presentations for ApacheCon 2021 now open
> To: 
>
>
> The ApacheCon Planners and the Apache Software Foundation are pleased to
> announce that ApacheCon@Home will be held online, September 21-23, 2021.
> Once again, we’ll be featuring content from dozens of our projects, as
> well as content about our community, how Apache works, business models
> around Apache software, the legal aspects of open source, and many other
> topics.
>
> Last year’s virtual ApacheCon@Home event was a big success, with 5,745
> registrants from more than 150 countries, spanning every time zone, with
> the virtual format delivering content to attendees who would never have
> attended an in-person ApacheCon (83% of post-event poll responders in
> 2020 indicated this was their first ApacheCon ever)!
>
> Given the great participation and excitement for last year’s event, we
> are announcing the Call for Presentations is now open to presenters from
> around the world until May 1st. Talks can be focused on the topics
> above, as well as any of our amazing projects. Submit your talks today!
>
> https://acah2021.jamhosted.net/
>
> We look forward to reviewing your contribution to one of the most
> popular open source software events in the world!
>
>
> --
> Rich Bowen, VP Conferences
> The Apache Software Foundation
> https://apachecon.com/
> @apachecon
>
>
> --
> Zoran Regvart



-- 
Zoran Regvart


Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-31 Thread Antoine Pitrou



I would favour the following characteristics :
- support for nanoseconds (especially as other Arrow temporal types 
support it)

- easy to handle (which excludes the ZetaSQL representtaion IMHO)

OTOH I don't really understand the point of supporting "the most 
reasonable ranges for Year, Month and Nanoseconds independently".  What 
does it bring to encode more than one month in the nanoseconds field? 
You can already use the Duration type for that.


Regards

Antoine.


Le 31/03/2021 à 05:48, Micah Kornfield a écrit :

To follow-up on this conversation I did some analysis on interval types:

https://docs.google.com/document/d/1i1E_fdQ_xODZcAhsV11Pfq27O50k679OYHXFJpm9NS0/edit
  Please feel free to add more details/systems I missed.

Given the disparate requirements of different systems I think the following 
might make sense for official types (if there isn't consensus, I might try to 
contributation extension Array implementations for them to Java and C++/Python 
separately).

1.  3 fields: Year (32 bit), Month (32 bit), Nanoseconds (64 bit) all signed.
2.  Postgres representation (Downside is it doesn't support Nanoseconds, only 
microseconds).
3.  ZetaSQL implementation (Requires some bit manipulation) but supports the 
most reasonable ranges for Year, Month and Nanoseconds independently.

Thoughts?

Micah

On 2021/02/18 04:30:55 Micah Kornfield wrote:


I didn’t find any page/documentation on how to do RFC in Arrow protocol,
so can anyone point me to it or PR with email will be enough?


That is enough to start discussion.  Before formal acceptance and merging
of the PR there needs to be a Java and C++ implementations for the type
that pass integration tests.  At the time this guideline was instituted
Java and C++ were considered the "reference" implementations (I think they
still have the most complete integration test coverage).

My understanding is that the current modelling of intervals mimics SQL
standards (e.g. SQL Server [1]).  So it would also be good to step back and
understand what problem DF is trying to solve and how it differs from other
SQL implementations.  I'd be hesitant to accept COMPLEX as a new type
without a much deeper analysis into calendar representations within Arrow
and how they relate to other existing systems (e.g. Hive and some
assortment of existing SQL databases).  For instance the current modelling
of timestamps does not lend itself to constructing a COMPLEX interval type
particularly well. (Duration was introduced for this reason).

I think both Wes's suggestion of FixedSizeBinary and Andrew's of composing
the with a struct are good stop-gaps.  These obviously have different
trade-offs.  Ultimately, it would be good to define common extension types
that can represent this use-case if there really is demand for it (if it
doesn't become a top level type).

[1]
https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types?view=sql-server-ver15

-Micah

On Wed, Feb 17, 2021 at 2:05 PM Andrew Lamb  wrote:


That is a great suggestion Wes, thank you.

I wonder if we could get away with a 128 bit representation that is the
concatenation of the two existing interval types (YearMonth)(DayTime). Or
maybe even define a `struct` type with those fields that is used by
DataFusion.

Basically, given our reading of the Arrow spec[1], it is currently not
possible to precisely represent an interval that has both monthly and
sub-montly granularity.

As Dmtry says, if you have an interval seemingly simple like  1 month, 1
day

Using IntervalUnit(YEAR_MONTH) can't represent the 1 day
Using IntervalUnit(DAY_TIME) can't represent the month as different months
have different numbers of days

[1]
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L249-L260


On Wed, Feb 17, 2021 at 5:01 PM Wes McKinney  wrote:


On Wed, Feb 17, 2021 at 3:46 PM  wrote:



It's unclear to me that this needs to be introduced into the

top-level


Similar thing to columnar format, How to store interval like 1 month 1

day 1 hour? It’s not possible to do it without converting 1 month to 30
days, which is a bad way.




Presumably you can represent a complex interval in a fixed number of
bytes, and then embed the data in a FixedSizeBinary type. You can
adorn this type with extension type metadata so that DataFusion can
then apply Interval semantics to it. This could also serve as an
interim strategy for you to proceed with implementation while
proposing a top-level type to the Arrow format (which may or may not
be accepting) so you aren't blocked on acceptance of changes into
Schema.fbs.


On 17 Feb 2021, at 21:02, Wes McKinney  wrote:

It's unclear to me that this needs to be introduced into the

top-level

columnar format without more analysis — have you considered
implementing this for DataFusion as an extension type for the time
being?

On Wed, Feb 17, 2021 at 11:59 AM t...@dmtry.me > wrote:


Hi,

For now, There are only two types of

Re: Re: [Format][RFC] Introduce COMPLEX type for IntervalUnit

2021-03-31 Thread Andrew Lamb
I agree with you that having fixed precision (e.g. postgres / zetasql) is
reasonable. Variable precision fields (ala SQL Server/Oracle) seem less
valuable to me.

I think support for nanosecond precision for intervals is important as
there are nanosecond precision timestamps

I don't think the postgres representation adds anything of practical use
(it still requires 16 bytes but has only microsecond precision). The 3
field implementation doesn't seem to have any way to represent integral
days, so I am also not sure about that one.

Adding just the ZetaSQL Implementation would be good enough for my
usecases.

Andrew

On Tue, Mar 30, 2021 at 11:48 PM Micah Kornfield 
wrote:

> To follow-up on this conversation I did some analysis on interval types:
>
>
> https://docs.google.com/document/d/1i1E_fdQ_xODZcAhsV11Pfq27O50k679OYHXFJpm9NS0/edit
> Please feel free to add more details/systems I missed.
>
> Given the disparate requirements of different systems I think the
> following might make sense for official types (if there isn't consensus, I
> might try to contributation extension Array implementations for them to
> Java and C++/Python separately).
>
> 1.  3 fields: Year (32 bit), Month (32 bit), Nanoseconds (64 bit) all
> signed.
> 2.  Postgres representation (Downside is it doesn't support Nanoseconds,
> only microseconds).
> 3.  ZetaSQL implementation (Requires some bit manipulation) but supports
> the most reasonable ranges for Year, Month and Nanoseconds independently.
>
> Thoughts?
>
> Micah
>
> On 2021/02/18 04:30:55 Micah Kornfield wrote:
> > >
> > > I didn’t find any page/documentation on how to do RFC in Arrow
> protocol,
> > > so can anyone point me to it or PR with email will be enough?
> >
> > That is enough to start discussion.  Before formal acceptance and merging
> > of the PR there needs to be a Java and C++ implementations for the type
> > that pass integration tests.  At the time this guideline was instituted
> > Java and C++ were considered the "reference" implementations (I think
> they
> > still have the most complete integration test coverage).
> >
> > My understanding is that the current modelling of intervals mimics SQL
> > standards (e.g. SQL Server [1]).  So it would also be good to step back
> and
> > understand what problem DF is trying to solve and how it differs from
> other
> > SQL implementations.  I'd be hesitant to accept COMPLEX as a new type
> > without a much deeper analysis into calendar representations within Arrow
> > and how they relate to other existing systems (e.g. Hive and some
> > assortment of existing SQL databases).  For instance the current
> modelling
> > of timestamps does not lend itself to constructing a COMPLEX interval
> type
> > particularly well. (Duration was introduced for this reason).
> >
> > I think both Wes's suggestion of FixedSizeBinary and Andrew's of
> composing
> > the with a struct are good stop-gaps.  These obviously have different
> > trade-offs.  Ultimately, it would be good to define common extension
> types
> > that can represent this use-case if there really is demand for it (if it
> > doesn't become a top level type).
> >
> > [1]
> >
> https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types?view=sql-server-ver15
> >
> > -Micah
> >
> > On Wed, Feb 17, 2021 at 2:05 PM Andrew Lamb 
> wrote:
> >
> > > That is a great suggestion Wes, thank you.
> > >
> > > I wonder if we could get away with a 128 bit representation that is the
> > > concatenation of the two existing interval types (YearMonth)(DayTime).
> Or
> > > maybe even define a `struct` type with those fields that is used by
> > > DataFusion.
> > >
> > > Basically, given our reading of the Arrow spec[1], it is currently not
> > > possible to precisely represent an interval that has both monthly and
> > > sub-montly granularity.
> > >
> > > As Dmtry says, if you have an interval seemingly simple like  1 month,
> 1
> > > day
> > >
> > > Using IntervalUnit(YEAR_MONTH) can't represent the 1 day
> > > Using IntervalUnit(DAY_TIME) can't represent the month as different
> months
> > > have different numbers of days
> > >
> > > [1]
> > >
> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L249-L260
> > >
> > >
> > > On Wed, Feb 17, 2021 at 5:01 PM Wes McKinney 
> wrote:
> > >
> > > > On Wed, Feb 17, 2021 at 3:46 PM  wrote:
> > > > >
> > > > > > It's unclear to me that this needs to be introduced into the
> > > top-level
> > > > >
> > > > > Similar thing to columnar format, How to store interval like 1
> month 1
> > > > day 1 hour? It’s not possible to do it without converting 1 month to
> 30
> > > > days, which is a bad way.
> > > > >
> > > >
> > > > Presumably you can represent a complex interval in a fixed number of
> > > > bytes, and then embed the data in a FixedSizeBinary type. You can
> > > > adorn this type with extension type metadata so that DataFusion can
> > > > then apply Interval semantics to it. This could also serve as an
> > > > interim st

[NIGHTLY] Arrow Build Report for Job nightly-2021-03-31-0

2021-03-31 Thread Crossbow


Arrow Build Report for Job nightly-2021-03-31-0

All tasks: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0

Failed Tasks:
- conda-linux-gcc-py38-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-drone-conda-linux-gcc-py38-aarch64
- gandiva-jar-ubuntu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-gandiva-jar-ubuntu
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-conda-cpp-valgrind
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-conda-python-3.8-jpype
- test-fedora-33-cpp:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-fedora-33-cpp
- test-ubuntu-18.04-docs:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-ubuntu-18.04-docs
- test-ubuntu-20.04-cpp-thread-sanitizer:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-ubuntu-20.04-cpp-thread-sanitizer

Pending Tasks:
- conda-win-vs2017-py36-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-conda-win-vs2017-py36-r36
- debian-buster-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-travis-debian-buster-arm64
- test-r-install-local:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-r-install-local
- test-r-linux-as-cran:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-r-linux-as-cran
- test-r-rhub-ubuntu-gcc-release:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-r-rhub-ubuntu-gcc-release
- test-r-rocker-r-base-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-r-rocker-r-base-latest
- test-r-rstudio-r-base-3.6-bionic:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-r-rstudio-r-base-3.6-bionic
- test-r-rstudio-r-base-3.6-centos7-devtoolset-8:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8
- test-r-rstudio-r-base-3.6-centos8:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-r-rstudio-r-base-3.6-centos8
- test-r-rstudio-r-base-3.6-opensuse15:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-r-rstudio-r-base-3.6-opensuse15
- test-r-rstudio-r-base-3.6-opensuse42:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-r-rstudio-r-base-3.6-opensuse42
- test-r-versions:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-test-r-versions
- test-ubuntu-18.04-r-sanitizer:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-test-ubuntu-18.04-r-sanitizer
- ubuntu-bionic-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-travis-ubuntu-bionic-arm64
- ubuntu-groovy-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-travis-ubuntu-groovy-arm64
- wheel-manylinux2014-cp38-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-travis-wheel-manylinux2014-cp38-arm64

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-github-centos-8-amd64
- conda-clean:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-conda-clean
- conda-linux-gcc-py36-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-drone-conda-linux-gcc-py36-aarch64
- conda-linux-gcc-py36-cpu-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-31-0-azure-conda-linux-gcc-py36-cpu-r36
- conda-linux-gcc-py36-cuda:
  URL: 
https://github.com/ursacomputi