Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-05 Thread Weston Pace
Congratulations Ben!

On Wed, May 5, 2021 at 6:48 PM Micah Kornfield 
wrote:

> Congrats!
>
> On Wed, May 5, 2021 at 4:33 PM David Li  wrote:
>
> > Congrats Ben! Well deserved.
> >
> > Best,
> > David
> >
> > On Wed, May 5, 2021, at 19:22, Neal Richardson wrote:
> > > Congrats Ben!
> > >
> > > Neal
> > >
> > > On Wed, May 5, 2021 at 4:16 PM Eduardo Ponce  > > wrote:
> > >
> > > > Great news! Congratulations Ben.
> > > >
> > > > ~Eduardo
> > > >
> > > > 
> > > > From: Wes McKinney  wesmckinn%40gmail.com
> > >>
> > > > Sent: Wednesday, May 5, 2021, 7:10 PM
> > > > To: dev
> > > > Subject: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman
> > > >
> > > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > > Benjamin Kietzman to become a PMC member and we are pleased to
> announce
> > > > that Benjamin has accepted.
> > > >
> > > > Congratulations and welcome!
> > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-05 Thread Micah Kornfield
Congrats!

On Wed, May 5, 2021 at 4:33 PM David Li  wrote:

> Congrats Ben! Well deserved.
>
> Best,
> David
>
> On Wed, May 5, 2021, at 19:22, Neal Richardson wrote:
> > Congrats Ben!
> >
> > Neal
> >
> > On Wed, May 5, 2021 at 4:16 PM Eduardo Ponce  > wrote:
> >
> > > Great news! Congratulations Ben.
> > >
> > > ~Eduardo
> > >
> > > 
> > > From: Wes McKinney mailto:wesmckinn%40gmail.com
> >>
> > > Sent: Wednesday, May 5, 2021, 7:10 PM
> > > To: dev
> > > Subject: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman
> > >
> > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > Benjamin Kietzman to become a PMC member and we are pleased to announce
> > > that Benjamin has accepted.
> > >
> > > Congratulations and welcome!
> > >
> > >
> >
>


Re: New style in documentation on the website looks great

2021-05-05 Thread Aldrin
I very much enjoy the new theme

Aldrin Montana
Computer Science PhD Student
UC Santa Cruz


On Tue, May 4, 2021 at 11:47 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> Thanks, I am happy that people like it!
> It's a slightly customized version of the pydata-sphinx-theme
> , to feature a single
> sidebar and some custom colors. Concrete feedback is certainly welcome (I
> am no design expert ;)).
>
> Joris
>
> On Sun, 2 May 2021 at 06:55, Micah Kornfield 
> wrote:
>
> > +1
> >
> > On Sat, May 1, 2021 at 5:09 PM David Li  wrote:
> >
> > > I agree, the new theme is very pleasant! Thanks to Joris, who did the
> > work
> > > in ARROW-12188 [1].
> > >
> > > Best,
> > > David
> > >
> > > [1]:
> > >
> >
> https://github.com/apache/arrow/commit/926452bcbebe9e952420688ad9a046bc16aa2ad8
> > >
> > > On Sat, May 1, 2021, at 19:23, Jorge Cardoso Leitão wrote:
> > > > Hi,
> > > >
> > > > I am not sure who was behind it, but it looks great! It really brings
> > > > harmony to the website and offers a much cleaner UI to everyone using
> > it.
> > > >
> > > > E.g. here: https://arrow.apache.org/docs/python/data.html
> > > >
> > > > Thanks a lot for this,
> > > > Jorge
> > > >
> > >
> >
>


Re: [DISCUSS][C++] Refactoring of Expression simplification passes

2021-05-05 Thread Wes McKinney
This seems like it could be a premature optimization, do we know what
fraction of important workloads are taken up by this operation?

On Wed, May 5, 2021 at 12:35 PM Benjamin Kietzman  wrote:
>
> Sorry, yes: I meant 4 microseconds and not 4 milliseconds.
>
> On Wed, May 5, 2021 at 1:27 PM Antoine Pitrou  wrote:
>
> > On Wed, 5 May 2021 13:23:36 -0400
> > Benjamin Kietzman  wrote:
> > > Currently, Expressions (used to specify dataset filters and projections)
> > > are simplified by direct rewriting: a filter such as `alpha == 2 and
> > beta >
> > > 3`
> > > on a partition where we are guaranteed that `beta == 5` will be rewritten
> > > to `alpha == 2` before evaluation against scanned batches. This can
> > > potentially occur for each scanned batch: for example, Parquet's row
> > group
> > > statistics are used in the same way to simplify filters.
> > >
> > > Rewriting is not extremely expensive (a microbenchmark estimate on
> > > my machine shows that a simple case such as the above takes 4ms).
> >
> > 4ms for a single rewriting actually sounds quite large to me.
> > (or did you mean 4µs?)
> >
> >
> >
> >


Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-05 Thread David Li
Congrats Ben! Well deserved.

Best,
David

On Wed, May 5, 2021, at 19:22, Neal Richardson wrote:
> Congrats Ben!
> 
> Neal
> 
> On Wed, May 5, 2021 at 4:16 PM Eduardo Ponce  > wrote:
> 
> > Great news! Congratulations Ben.
> >
> > ~Eduardo
> >
> > 
> > From: Wes McKinney mailto:wesmckinn%40gmail.com>>
> > Sent: Wednesday, May 5, 2021, 7:10 PM
> > To: dev
> > Subject: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman
> >
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Benjamin Kietzman to become a PMC member and we are pleased to announce
> > that Benjamin has accepted.
> >
> > Congratulations and welcome!
> >
> >
> 


Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-05 Thread Neal Richardson
Congrats Ben!

Neal

On Wed, May 5, 2021 at 4:16 PM Eduardo Ponce  wrote:

> Great news! Congratulations Ben.
>
> ~Eduardo
>
> 
> From: Wes McKinney 
> Sent: Wednesday, May 5, 2021, 7:10 PM
> To: dev
> Subject: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman
>
> The Project Management Committee (PMC) for Apache Arrow has invited
> Benjamin Kietzman to become a PMC member and we are pleased to announce
> that Benjamin has accepted.
>
> Congratulations and welcome!
>
>


Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-05 Thread Eduardo Ponce
Great news! Congratulations Ben.

~Eduardo


From: Wes McKinney 
Sent: Wednesday, May 5, 2021, 7:10 PM
To: dev
Subject: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

The Project Management Committee (PMC) for Apache Arrow has invited
Benjamin Kietzman to become a PMC member and we are pleased to announce
that Benjamin has accepted.

Congratulations and welcome!



[ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-05 Thread Wes McKinney
The Project Management Committee (PMC) for Apache Arrow has invited
Benjamin Kietzman to become a PMC member and we are pleased to announce
that Benjamin has accepted.

Congratulations and welcome!


Re: [DISCUSS][C++] Refactoring of Expression simplification passes

2021-05-05 Thread Benjamin Kietzman
Sorry, yes: I meant 4 microseconds and not 4 milliseconds.

On Wed, May 5, 2021 at 1:27 PM Antoine Pitrou  wrote:

> On Wed, 5 May 2021 13:23:36 -0400
> Benjamin Kietzman  wrote:
> > Currently, Expressions (used to specify dataset filters and projections)
> > are simplified by direct rewriting: a filter such as `alpha == 2 and
> beta >
> > 3`
> > on a partition where we are guaranteed that `beta == 5` will be rewritten
> > to `alpha == 2` before evaluation against scanned batches. This can
> > potentially occur for each scanned batch: for example, Parquet's row
> group
> > statistics are used in the same way to simplify filters.
> >
> > Rewriting is not extremely expensive (a microbenchmark estimate on
> > my machine shows that a simple case such as the above takes 4ms).
>
> 4ms for a single rewriting actually sounds quite large to me.
> (or did you mean 4µs?)
>
>
>
>


Re: [DISCUSS][C++] Refactoring of Expression simplification passes

2021-05-05 Thread Antoine Pitrou
On Wed, 5 May 2021 13:23:36 -0400
Benjamin Kietzman  wrote:
> Currently, Expressions (used to specify dataset filters and projections)
> are simplified by direct rewriting: a filter such as `alpha == 2 and beta >
> 3`
> on a partition where we are guaranteed that `beta == 5` will be rewritten
> to `alpha == 2` before evaluation against scanned batches. This can
> potentially occur for each scanned batch: for example, Parquet's row group
> statistics are used in the same way to simplify filters.
> 
> Rewriting is not extremely expensive (a microbenchmark estimate on
> my machine shows that a simple case such as the above takes 4ms).

4ms for a single rewriting actually sounds quite large to me.
(or did you mean 4µs?)





[DISCUSS][C++] Refactoring of Expression simplification passes

2021-05-05 Thread Benjamin Kietzman
Currently, Expressions (used to specify dataset filters and projections)
are simplified by direct rewriting: a filter such as `alpha == 2 and beta >
3`
on a partition where we are guaranteed that `beta == 5` will be rewritten
to `alpha == 2` before evaluation against scanned batches. This can
potentially occur for each scanned batch: for example, Parquet's row group
statistics are used in the same way to simplify filters.

Rewriting is not extremely expensive (a microbenchmark estimate on
my machine shows that a simple case such as the above takes 4ms).
However it does complicate an execution of a logical plan wherein
expressions being evaluated are not identical to the expression with
which the plan was constructed.

It seems it might be preferable to avoid mutating expressions and instead
build a mapping from sub expressions to known values which can be used
by subsequent simplification passes and during execution. I'd have to
benchmark it, but it also seems like we might speed up expression
simplification this way. Any thoughts?


Apache Arrow Rust Sync Call 5/5/2021

2021-05-05 Thread Andy Grove
Attendees


   -

   Andy Grove
   -

   Andrew Lamb
   -

   Jorge Leitao
   -

   Danniel Heres
   -

   Fernando Herrera
   -

   Andrew Lamb
   -

   Ruan Pearce-Authers
   -

   Benjamin Blodgett
   -

   Jorn Horstmann
   -

   Michael Edwards


Topics Discussed


   -

   Andrew has an updated proposal on the mailing list [1] for bi-weekly
   releases of arrow-rs and would like feedback
   -

   Now that the Python bindings are merged, we need to decide what to do
   about documentation and release process. Jorge will create a GitHub issue
   and linked Google document with a proposal.
   -

   There was a discussion about whether we should support 32-bit or not.
   Ruan is going to file GitHub issues against both arrow-rs and
   arrow-datafusion about adding 32-bit testing to CI and we can continue the
   discussion with the community there.
   -

   Jorge has a PR open against the Arrow repo for adding guidelines for
   experimental repos [2] and would like feedback
   -

   Jorge plans on writing a proposal to donate arrow2/parquet2 to new
   experimental repos once we have the guidelines in place. There were
   questions about whether experimental repo / branch / feature flag would be
   the best approach and the proposal will address these questions.


[1]
https://mail-archives.apache.org/mod_mbox/arrow-dev/202105.mbox/%3CCAFhtnRyKA4OvEeeeRe5ux46AvVxs74j0dDoY5EtGyRNBo8%2BhTQ%40mail.gmail.com%3E

[2] https://github.com/apache/arrow/pull/10239


[Rust] V2 Proposal for bi-weekly Rust Arrow Releases

2021-05-05 Thread Andrew Lamb
First of all, thank you for all the comments so far on the proposal for
releasing the Rust Arrow implementation more frequently. I have
incorporated the feedback from the initial proposal into an updated
proposal for bi weekly Rust arrow releases[1].

The largest change, in my opinion, is the proposal to use two branches
rather than restrict merge windows, and an expanded FAQ section.

Please do let me know what you think. Depending on how this round of
feedback goes, I am hoping for a formal vote sometime next week.

Andrew


[1]
https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit


Re: [VOTE] Register media types (MIME types) for Apache Arrow formats to IANA

2021-05-05 Thread Kazuaki Ishizaki
+1, great

Weston Pace  wrote on 2021/05/04 20:41:34:

> From: Weston Pace 
> To: dev@arrow.apache.org
> Date: 2021/05/04 20:41
> Subject: [EXTERNAL] [VOTE] Register media types (MIME types) for 
> Apache Arrow formats to IANA
> 
> Per ARROW-7396 I would like to propose an application to the IANA to
> register media types for the Arrow IPC formats (both file and
> streaming).
> 
> The proposed application is available as [1].  It is based on previous
> discussion in a draft [2] as well as two ML threads [3][4].
> 
> For reference, the IANA application form is an interactive form found
> at [5].  I am proposing we fill out this form twice, once for the IPC
> file format and once for the IPC streaming format.  The resulting
> media types would be application/vnd.apache.arrow.stream and
> application/vnd.apache.arrow.file
> 
> [ ] +1 Submit these applications to the IANA
> [ ] +0
> [ ] +1 Do not submit these applications to the IANA because...
> 
> [1]: INVALID URI REMOVED
> 
u=https-3A__docs.google.com_document_d_17DfpE33-5FlSZCp6HTsvHDRUrL6LafD02erfbDhnEqZ9Q_edit-3Fusp-3Dsharing=DwIBaQ=jf_iaSHvJObTbx-
> siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg=Igio2wWZp15D_B_Tfn0dB8kMv-
> exeKZzcdi9FHmEbxI=GBCVuJuFEztsM3YzmrtRovUorrg28qtfRH3c88s06I0= 
> [2]: INVALID URI REMOVED
> 
u=https-3A__docs.google.com_document_d_1PmZFoSifV-5FTX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ_edit-3Fusp-3Dsharing=DwIBaQ=jf_iaSHvJObTbx-
> siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg=Igio2wWZp15D_B_Tfn0dB8kMv-
> exeKZzcdi9FHmEbxI=BUwMIFLsDkSy4ORAC2n-QLmRZqM1x4-UodkvdlBKP7Q= 
> [3]: INVALID URI REMOVED
> 
u=https-3A__lists.apache.org_thread.html_b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267-2540-253Cdev.arrow.apache.org-253E=DwIBaQ=jf_iaSHvJObTbx-
> siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg=Igio2wWZp15D_B_Tfn0dB8kMv-
> exeKZzcdi9FHmEbxI=zYwWGYFCgLfEmdV_v2N--oKACJ9J9EZo5vYN6wkER0k= 
> [4]: INVALID URI REMOVED
> 
u=https-3A__lists.apache.org_thread.html_re80b8c87d472ceaa2a38a26f853bb278cff3ea01a3a119696a540eee-2540-253Cdev.arrow.apache.org-253E=DwIBaQ=jf_iaSHvJObTbx-
> siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg=Igio2wWZp15D_B_Tfn0dB8kMv-
> exeKZzcdi9FHmEbxI=1IBVsnX7EPt5VgxtHm7yU7xMfl1qJCvhixqgLuSKXxA= 
> [5]: INVALID URI REMOVED
> u=https-3A__www.iana.org_form_media-2Dtypes=DwIBaQ=jf_iaSHvJObTbx-
> siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg=Igio2wWZp15D_B_Tfn0dB8kMv-
> exeKZzcdi9FHmEbxI=dqhpKRC43OHWTGz0L7kpHaHltPGi4GmNym9dDSBzFCw= 
> 




Re: [DISCUSS] [Rust] Python-datafusion

2021-05-05 Thread Andy Grove
Wes, thanks for following up on this and making sure that we are following
the process here. I have merged a PR to revert the previous revert, so the
Python bindings are now back in the repo.

On Tue, May 4, 2021 at 4:14 PM Wes McKinney  wrote:

> Based on the general@incubator thread, there isn't a 100% consensus
> but I think we can accept the PR as is and move forward. I appreciate
> everyone's patience
>
> On Tue, May 4, 2021 at 10:24 AM Wes McKinney  wrote:
> >
> > See thread on general@incubator
> >
> >
> https://lists.apache.org/thread.html/r3108dd293240967cab4d75a8003895b247b3b3b726a7e1e54f3d9b65%40%3Cgeneral.incubator.apache.org%3E
> >
> > On Tue, May 4, 2021 at 9:35 AM Wes McKinney  wrote:
> > >
> > > I admit it's an unusual situation to have a single-author codebase
> > > where the developer is on the PMC, let's determine what is the
> > > protocol for this kind of thing in the future so we don't create
> > > unnecessary work for ourselves.
> > >
> > > On Tue, May 4, 2021 at 9:15 AM Andy Grove 
> wrote:
> > > >
> > > > I apologize. For some reason, I had thought that because Jorge was
> the only
> > > > contributor (except for one contribution fixing a typo in the
> README) that
> > > > the IP clearance process did not apply in this case.
> > > >
> > > > I will create a PR to revert.
> > > >
> > > > On Tue, May 4, 2021 at 8:06 AM Wes McKinney 
> wrote:
> > > >
> > > > > Just to circle back on this. Since this was an independent codebase
> > > > > previously developed over a 10 month period, I had assumed we
> would be
> > > > > looking at an IP clearance vote, but instead it was just merged
> into
> > > > > arrow-datafusion.
> > > > >
> > > > > On Tue, Apr 27, 2021 at 10:50 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Jorge,
> > > > > > This all sounds good to me.  It might be nice to test against
> both the
> > > > > > pinned released version of pyarrow and at head if possible.
> > > > > >
> > > > > > I like the idea of not causing release churn as long as all the
> > > > > underlying
> > > > > > libraries are compatible.
> > > > > >
> > > > > > Thanks for the write up.
> > > > > >
> > > > > > -Micah
> > > > > >
> > > > > > On Mon, Apr 26, 2021 at 10:30 AM Jorge Cardoso Leitão <
> > > > > > jorgecarlei...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi Micah,
> > > > > > >
> > > > > > > All testing is actually done from Python: create a record
> batch in
> > > > > > > pyarrow, push it to datafusion,
> > > > > > > consume it back in Python, and compare the result using
> pyarrows'
> > > > > > > equality. Sometimes parquet is used instead.
> > > > > > > The library is tested against pyarrow==1 from pypi: we can
> bump that,
> > > > > but
> > > > > > > if it works in pyarrow==1,
> > > > > > > chances are things will improve with higher versions :)
> > > > > > >
> > > > > > > Releases: I thought to have it released as a separate wheel
> for two
> > > > > > > reasons:
> > > > > > >
> > > > > > > * not force people that want pyarrow to download datafusion
> binaries
> > > > > with
> > > > > > > it
> > > > > > > * have independent versioning from pyarrow
> > > > > > >
> > > > > > > and "bracked" the pyarrow that we ensure compatibility with.
> > > > > > >
> > > > > > > Another alternative is to release with the same versioning as
> > > > > datafusion,
> > > > > > > like arrow c++ / pyarrow and spark / pyspark.
> > > > > > > The upside is that the versions are aligned. The downside is
> that we
> > > > > will
> > > > > > > be releasing a lot of majors for no reason: so far, all
> backward
> > > > > > > incompatible changes in datafusion were not backward
> incompatible in
> > > > > > > python-datafusion: it is easier to break backward compat. in a
> Rust
> > > > > library
> > > > > > > than it is in a Python wrapper to a Rust library.
> > > > > > >
> > > > > > > What are your thoughts, Micah?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jorge
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Apr 25, 2021 at 10:32 PM Micah Kornfield <
> > > > > emkornfi...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Jorge,
> > > > > > >> I think this would certainly be a valuable contribution.  How
> were you
> > > > > > >> thinking of hosting (which repo)/publishing it
> (maintaintaining a
> > > > > separate
> > > > > > >> wheel)?  Also did you have thoughts integration testing with
> pyarrow?
> > > > > > >>
> > > > > > >> Cheers,
> > > > > > >> Micah
> > > > > > >>
> > > > > > >> On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão <
> > > > > > >> jorgecarlei...@gmail.com> wrote:
> > > > > > >>
> > > > > > >> > Hi,
> > > > > > >> >
> > > > > > >> > I fielded a PR [1] to open up a discussion to incorporate
> > > > > > >> python-datafusion
> > > > > > >> > [2] into the Apache Arrow project.
> > > > > > >> >
> > > > > > >> > Python-datafusion is a Python library [3] built on top of
> > > > > DataFusions
> > > > 

Re: [VOTE] Register media types (MIME types) for Apache Arrow formats to IANA

2021-05-05 Thread Andrew Lamb
+1

On Wed, May 5, 2021 at 8:09 AM Sutou Kouhei  wrote:

> +1
>
> In 
>   "[VOTE] Register media types (MIME types) for Apache Arrow formats to
> IANA" on Tue, 4 May 2021 01:41:34 -1000,
>   Weston Pace  wrote:
>
> > Per ARROW-7396 I would like to propose an application to the IANA to
> > register media types for the Arrow IPC formats (both file and
> > streaming).
> >
> > The proposed application is available as [1].  It is based on previous
> > discussion in a draft [2] as well as two ML threads [3][4].
> >
> > For reference, the IANA application form is an interactive form found
> > at [5].  I am proposing we fill out this form twice, once for the IPC
> > file format and once for the IPC streaming format.  The resulting
> > media types would be application/vnd.apache.arrow.stream and
> > application/vnd.apache.arrow.file
> >
> > [ ] +1 Submit these applications to the IANA
> > [ ] +0
> > [ ] +1 Do not submit these applications to the IANA because...
> >
> > [1]:
> https://docs.google.com/document/d/17DfpE33_lSZCp6HTsvHDRUrL6LafD02erfbDhnEqZ9Q/edit?usp=sharing
> > [2]:
> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> > [3]:
> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> > [4]:
> https://lists.apache.org/thread.html/re80b8c87d472ceaa2a38a26f853bb278cff3ea01a3a119696a540eee%40%3Cdev.arrow.apache.org%3E
> > [5]: https://www.iana.org/form/media-types
>


Re: [VOTE] Register media types (MIME types) for Apache Arrow formats to IANA

2021-05-05 Thread Sutou Kouhei
+1

In 
  "[VOTE] Register media types (MIME types) for Apache Arrow formats to IANA" 
on Tue, 4 May 2021 01:41:34 -1000,
  Weston Pace  wrote:

> Per ARROW-7396 I would like to propose an application to the IANA to
> register media types for the Arrow IPC formats (both file and
> streaming).
> 
> The proposed application is available as [1].  It is based on previous
> discussion in a draft [2] as well as two ML threads [3][4].
> 
> For reference, the IANA application form is an interactive form found
> at [5].  I am proposing we fill out this form twice, once for the IPC
> file format and once for the IPC streaming format.  The resulting
> media types would be application/vnd.apache.arrow.stream and
> application/vnd.apache.arrow.file
> 
> [ ] +1 Submit these applications to the IANA
> [ ] +0
> [ ] +1 Do not submit these applications to the IANA because...
> 
> [1]: 
> https://docs.google.com/document/d/17DfpE33_lSZCp6HTsvHDRUrL6LafD02erfbDhnEqZ9Q/edit?usp=sharing
> [2]: 
> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> [3]: 
> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> [4]: 
> https://lists.apache.org/thread.html/re80b8c87d472ceaa2a38a26f853bb278cff3ea01a3a119696a540eee%40%3Cdev.arrow.apache.org%3E
> [5]: https://www.iana.org/form/media-types


[DataFusion] [Discuss] Output Schema for queries with multiple relations

2021-05-05 Thread Andrew Lamb
I wanted to bring some additional attention to some discussion occurring on
a PR [1], specifically the proposal of how to construct output field names
from queries that have multiple relations (that may have the same input
field).

The documents are:
* Document for output schema field name semantics with examples: [2]
* Proposed change to @jorgecarleitao 's invariant doc [3]
* Updated invariant doc with proposed changes applied [4]

Please comment on the PR / in the docs if you are interested.

Andrew

[1]
https://github.com/apache/arrow-datafusion/pull/55#issuecomment-831405269
[2]
https://docs.google.com/document/d/1uviWavwEGD3qxwMk2AGkOgp6ENrvKGiMWQhHNbqPwhg/edit?usp=sharing
[3]
https://docs.google.com/document/d/158gbfDp8pcakfriT2l7dHChwJB43_RV7lcWfxEC73ng/edit?usp=sharing
[4]
https://docs.google.com/document/d/1dbK-3eaTHlzZcHzpTk1h-LA3b7dcxsVBcoZeVKYIPwI/edit?usp=sharing


[NIGHTLY] Arrow Build Report for Job nightly-2021-05-05-0

2021-05-05 Thread Crossbow


Arrow Build Report for Job nightly-2021-05-05-0

All tasks: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0

Failed Tasks:
- conda-linux-gcc-py36-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-drone-conda-linux-gcc-py36-arm64
- conda-linux-gcc-py37-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-drone-conda-linux-gcc-py37-arm64
- conda-linux-gcc-py38-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-drone-conda-linux-gcc-py38-arm64
- conda-linux-gcc-py39-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-drone-conda-linux-gcc-py39-arm64
- conda-osx-clang-py38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-osx-clang-py38
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-github-test-conda-python-3.8-jpype

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-github-centos-8-amd64
- centos-8-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-travis-centos-8-arm64
- conda-clean:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-clean
- conda-linux-gcc-py36-cpu-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-linux-gcc-py36-cpu-r36
- conda-linux-gcc-py36-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-linux-gcc-py36-cuda
- conda-linux-gcc-py37-cpu-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-linux-gcc-py37-cpu-r40
- conda-linux-gcc-py37-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-linux-gcc-py37-cuda
- conda-linux-gcc-py38-cpu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-linux-gcc-py38-cpu
- conda-linux-gcc-py38-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-linux-gcc-py38-cuda
- conda-linux-gcc-py39-cpu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-linux-gcc-py39-cpu
- conda-linux-gcc-py39-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-linux-gcc-py39-cuda
- conda-osx-arm64-clang-py38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-osx-arm64-clang-py38
- conda-osx-arm64-clang-py39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-osx-arm64-clang-py39
- conda-osx-clang-py36-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-osx-clang-py36-r36
- conda-osx-clang-py37-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-osx-clang-py37-r40
- conda-osx-clang-py39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-osx-clang-py39
- conda-win-vs2017-py36-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-win-vs2017-py36-r36
- conda-win-vs2017-py37-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-win-vs2017-py37-r40
- conda-win-vs2017-py38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-win-vs2017-py38
- conda-win-vs2017-py39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-azure-conda-win-vs2017-py39
- debian-bullseye-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-github-debian-bullseye-amd64
- debian-bullseye-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-travis-debian-bullseye-arm64
- debian-buster-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-05-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 

Re: New style in documentation on the website looks great

2021-05-05 Thread Joris Van den Bossche
Thanks, I am happy that people like it!
It's a slightly customized version of the pydata-sphinx-theme
, to feature a single
sidebar and some custom colors. Concrete feedback is certainly welcome (I
am no design expert ;)).

Joris

On Sun, 2 May 2021 at 06:55, Micah Kornfield  wrote:

> +1
>
> On Sat, May 1, 2021 at 5:09 PM David Li  wrote:
>
> > I agree, the new theme is very pleasant! Thanks to Joris, who did the
> work
> > in ARROW-12188 [1].
> >
> > Best,
> > David
> >
> > [1]:
> >
> https://github.com/apache/arrow/commit/926452bcbebe9e952420688ad9a046bc16aa2ad8
> >
> > On Sat, May 1, 2021, at 19:23, Jorge Cardoso Leitão wrote:
> > > Hi,
> > >
> > > I am not sure who was behind it, but it looks great! It really brings
> > > harmony to the website and offers a much cleaner UI to everyone using
> it.
> > >
> > > E.g. here: https://arrow.apache.org/docs/python/data.html
> > >
> > > Thanks a lot for this,
> > > Jorge
> > >
> >
>


Re: [VOTE] Register media types (MIME types) for Apache Arrow formats to IANA

2021-05-05 Thread Joris Van den Bossche
+1

On Tue, 4 May 2021 at 13:41, Weston Pace  wrote:

> Per ARROW-7396 I would like to propose an application to the IANA to
> register media types for the Arrow IPC formats (both file and
> streaming).
>
> The proposed application is available as [1].  It is based on previous
> discussion in a draft [2] as well as two ML threads [3][4].
>
> For reference, the IANA application form is an interactive form found
> at [5].  I am proposing we fill out this form twice, once for the IPC
> file format and once for the IPC streaming format.  The resulting
> media types would be application/vnd.apache.arrow.stream and
> application/vnd.apache.arrow.file
>
> [ ] +1 Submit these applications to the IANA
> [ ] +0
> [ ] +1 Do not submit these applications to the IANA because...
>
> [1]:
> https://docs.google.com/document/d/17DfpE33_lSZCp6HTsvHDRUrL6LafD02erfbDhnEqZ9Q/edit?usp=sharing
> [2]:
> https://docs.google.com/document/d/1PmZFoSifV_TX4vXnv775WiOtqCgz5zLF5ryFRWio3HQ/edit?usp=sharing
> [3]:
> https://lists.apache.org/thread.html/b15726d0c0da2223ba1b45a226ef86263f688b20532a30535cd5e267%40%3Cdev.arrow.apache.org%3E
> [4]:
> https://lists.apache.org/thread.html/re80b8c87d472ceaa2a38a26f853bb278cff3ea01a3a119696a540eee%40%3Cdev.arrow.apache.org%3E
> [5]: https://www.iana.org/form/media-types
>