Re: [ANNOUNCE] New Arrow committer: Will Ayd

2024-10-01 Thread Alenka Frim
Welcome Will!!

V tor., 1. okt. 2024, 20:25 je oseba Ruoxi Sun 
napisala:

> Welcome and congrats!
>
>
> *Regards,*
> *Rossi SUN*
>
>
> Raúl Cumplido 于2024年10月2日 周三01:26写道:
>
> > Welcome Will!
> >
> > El mar, 1 oct 2024, 19:21, Will Ayd 
> > escribió:
> >
> > > Thanks everyone - I am very happy to be a part of the team. I've
> learned
> > > a lot through my interactions with everyone so far and have really had
> > > fun working together, so I am excited to continue along that path with
> > > you all.
> > >
> > > - Will
> > >
> > > On 10/1/24 1:08 PM, Kevin Gurney wrote:
> > > > Congratulations, Will!
> > > >
> > > > Best Regards,
> > > >
> > > > Kevin Gurney
> > > > 
> > > > From: Ian Cook 
> > > > Sent: Tuesday, October 1, 2024 1:02 PM
> > > > To: dev@arrow.apache.org 
> > > > Subject: Re: [ANNOUNCE] New Arrow committer: Will Ayd
> > > >
> > > > Congratulations Will!
> > > >
> > > > On Tue, Oct 1, 2024 at 1:01 PM Rok Mihevc 
> > wrote:
> > > >
> > > >> Welcome Will! Great to hear!
> > > >>
> > > >> Rok
> > > >>
> > > >> On Tue, Oct 1, 2024 at 6:58 PM Antoine Pitrou 
> > > wrote:
> > > >>
> > > >>> Hello Will, and thanks a lot for your involvement!
> > > >>>
> > > >>>
> > > >>> Le 01/10/2024 à 18:55, Dewey Dunnington a écrit :
> > >  On behalf of the Arrow PMC, I'm happy to announce that Will Wyd
> has
> > >  accepted an invitation to become a committer on Apache Arrow.
> > Welcome,
> > >  and thank you for your contributions!
> > > 
> > >  -dewey
> > >
> >
>


PyArrow development community meeting

2024-09-25 Thread Alenka Frim
Hi all,

Starting next week, we plan to organise a PyArrow development community
meeting that
will take place every 4 weeks. The first meeting will be held on Wednesday
2nd of October
at 3PM UTC.

To add future meetings to your Google calendar visit:

   - Google Calendar
   


In the meeting we plan to talk about the latest developments in the PyArrow
library,
triaging, open issues and PRs.

*Everyone is welcome to attend and to edit the document with the topics to
discuss.*
Meeting notes will be captured in this Google Doc:

   - Google Document
   


This Google Doc is globally readable, but edit access is limited for
safety. Use the Request
edit access button to ask for edit access or add content via comments
option.

All well,
Alenka


Re: [DISCUSS] Monorepo GitHub workflow: allow one issue with multiple PRs

2024-09-13 Thread Alenka Frim
Hi all,

Thank you for raising this topic Joris. I do agree with what you propose as
I
frequently see separate but very much connected issues being opened in
PyArrow
just for the sake of having a 1-1 relationship (and I do the same) . I feel
it adds
to a huge number of issues unnecessarily and adds to the noise making it
harder
to have a good overview.

So yes, +1 from me if this doesn't add too much work to update our current
merge
workflow (I am also +1 to revisit if we do need the merge script).

Best,
Alenka

V V čet., 12. sep. 2024 ob 19:56 je oseba Rok Mihevc 
napisala:

> Perhaps adding a count tag to the PR titles would be useful for such cases?
> e.g.: GH-: [/] tags> 
>
> Rok
>
> On Thu, Sep 12, 2024 at 10:37 AM Antoine Pitrou 
> wrote:
>
> >
> > Hi,
> >
> > I don't have a specific opinion on this, but as a data point, this
> > already happens from time to time (though rarely).
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 11/09/2024 à 17:32, Joris Van den Bossche a écrit :
> > > Hi all,
> > >
> > > This is a discussion specifically for the GitHub development workflow
> > > we use in the monorepo, i.e. https://github.com/apache/arrow/
> > >
> > > We have the unwritten(?) (but implicitly implied by our tooling) rule
> > > that we always should have one issue for one PR to close that issue.
> > > I would like to discuss expanding that to explicitly allow making
> > > multiple PRs that link to the same issue.
> > >
> > > For clarity, I don't want to discuss the usefulness of actually having
> > > an issue linked to a PR (we could discuss expanding the scope of our
> > > "minor" PRs, but that's for a separate discussion I would say).
> > > But in practice, you regularly want to split up the work related to
> > > the same topic into multiple PRs (to have smaller PRs, to ease
> > > reviewing, etc). At the moment, to follow our workflow, that requires
> > > creating a bunch of dummy child issues just to have a unique issue
> > > number to reference in each PR. While in practice they could all
> > > reference the same issue number. This keeps the relevant information
> > > more centralized in that one issue, and avoids the noise of a flood of
> > > dummy issues in our issue list.
> > >
> > > Practical example: currently I am planning to work on adding type
> > > annotations to the pyarrow library. I will probably split up that work
> > > in a PR per module, but they can all reference a single parent issue
> > > instead of also creating an issue about "adding type annotations in
> > > module xxx" for each PR.
> > >
> > > ---
> > >
> > > I think this is perfectly possible with our current tooling, if we
> > > want, with the following notes:
> > >
> > > - The current merge script will ask you to update (i.e. close) the
> > > issue, and at that point if you know this is a parent issue you should
> > > say "no" (or afterwards reopen the issue).
> > > (we could also discuss whether we actually need this merge script, but
> > > let's keep that for another thread? ;))
> > >
> > > - The release notes generation currently relies on listing issues, and
> > > not PRs. That means if you want the issue listed, it should be closed
> > > (and tagged with that milestone) by the time of the release (if it is
> > > ungoing work, you can at that point create a new issue for all PRs
> > > going into the next release).
> > >
> > > - If a PR needs to be backported, that also depends on its connection
> > > to and the milestone of the issue. Thus, for PRs that need to be
> > > backported, you should always open a unique issue and it should not
> > > reference an issue tracking multiple PRs.
> > >
> > > Thoughts? Concerns with allowing this?
> > >
> > > Best,
> > > Joris
> >
>


Re: Arrow Maintainer Dashboard

2024-08-29 Thread Alenka Frim
Hello Vibhatha!

Thank you for taking a look and sharing your observations.
It is much appreciated!

The reason for the extra column with the issue or PR link is
in case one would want to use the copy/csv/excel button to
export the list of issues. In that case the link information is
lost and you only get the title in the output. Extra column with
the link helps the user to still have this information when
using the export functionality.

I will continue to look for a better solution as I am a bit
enjoyed by the redundancy in the dashboard myself.

Vibhatha, would you be interested in having a separate
page in the dashboard for Java? In case you feel you might use it,
feel free to open an issue or respond here. We can add a
separate Java page very quickly.

All well,
Alenka

V V pet., 30. avg. 2024 ob 04:14 je oseba Vibhatha Abeykoon <
vibha...@gmail.com> napisala:

> Hi Alenka and Nic,
>
> This is a wonderful effort and neat.
>
> Just a thought from my end:
>
> Once clicked on the Issue tab, the table contains a column for the GitHub
> link, since the title already has a hyperlink, it seems like that column is
> redundant. I guess the same goes for the table in the PR tab. This doesn't
> need to change but is merely an observation.
>
>
> With Regards,
> Vibhatha Abeykoon
>
>
> On Wed, Aug 7, 2024 at 6:51 PM Raúl Cumplido  wrote:
>
> > Hi,
> >
> > If I want to filter and only see the issues with no replies for Python
> for
> > example.
> >
> > Kind regards,
> >
> > Raúl
> >
> >
> >
> >
> > El mié, 7 ago 2024, 15:09, Aldrin  escribió:
> >
> > > Hi Raúl,
> > >
> > > Clickable in what way? I can click on the legend and i can zoom in on
> the
> > > graphs, just wondering what interaction you're thinking of?
> > >
> > > -Aldrin
> > >
> > > Sent from Proton Mail <https://proton.me/mail/home> for iOS
> > >
> > >
> > > On Wed, Aug 7, 2024 at 02:14, Alenka Frim  > > > wrote:
> > >
> > > Hi Raúl,
> > >
> > > Thank you for your feedback! I love the idea and have created an
> > > issue for it: https://github.com/arrow-maintenance/arrowdash/issues/25
> > >
> > > I am not sure it is doable, but I will do some research.
> > >
> > > Best,
> > > Alenka
> > >
> > > V V sre., 7. avg. 2024 ob 10:47 je oseba Raúl Cumplido <
> > rau...@apache.org>
> > > napisala:
> > >
> > > > Hi Alenka and Nic,
> > > >
> > > > This is great! Thanks for the effort.
> > > >
> > > > I would love for the top sections `Issues from New Contributors` or
> > > > `Issues with No Replies` to be clickable so it filters the data
> > > > displayed on the table. Same as for the PRs.
> > > >
> > > > Thank you very much for working on this!
> > > >
> > > > Regards,
> > > > Raúl
> > > >
> > > > El mié, 7 ago 2024 a las 7:50, Alenka Frim ()
> > > > escribió:
> > > > >
> > > > > Hello everyone!
> > > > >
> > > > > Nic Crane and I have been working on a Quarto dashboard [1] that
> > would
> > > > help
> > > > > organise different sources of information when doing maintenance.
> The
> > > > > dashboard currently pulls in the data from
> > > > >
> > > > > * GitHub (for the last 3 months)
> > > > > * Stack Overflow (for the last 3 months)
> > > > > * Mailing list (for the last month)
> > > > >
> > > > > and supports data from Python and R but can be easily extended
> > > > > to include data from other implementations [2].
> > > > >
> > > > > We are sharing this through the mailing list in case others find
> the
> > > tool
> > > > > useful
> > > > > and also to gather any thoughts, feedback or suggestions.
> > > > >
> > > > > All comments are most welcome!
> > > > >
> > > > > We also plan to do a short demo in the upcoming Arrow Meetup in
> > Seattle
> > > > > on the 13th of August.
> > > > >
> > > > > All well,
> > > > > Alenka
> > > > >
> > > > > [1]: https://arrow-maintenance.github.io/arrowdash
> > > > > [2]: https://github.com/arrow-maintenance/arrowdash
> > > >
> > >
> > >
> >
>


Re: [VOTE][Format] Bool8 Canonical Extension Type

2024-08-07 Thread Alenka Frim
+1 (non-binding)

V V tor., 6. avg. 2024 ob 19:40 je oseba Dane Pitkin 
napisala:

> +1 (non-binding)
>
> Nice work!
>
> On Tue, Aug 6, 2024 at 12:22 PM Joris Van den Bossche <
> jorisvandenboss...@gmail.com> wrote:
>
> > +1 (binding)
> >
> > On Tue, 6 Aug 2024 at 17:41, Matt Topol  wrote:
> > >
> > > +1 (binding)
> > >
> > > On Tue, Aug 6, 2024 at 11:40 AM Felipe Oliveira Carvalho <
> > > felipe...@gmail.com> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > --
> > > > Felipe
> > > >
> > > > On Tue, Aug 6, 2024 at 6:24 AM Gang Wu  wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Looked through the spec and C++ impl.
> > > > >
> > > > > Best,
> > > > > Gang
> > > > >
> > > > > On Tue, Aug 6, 2024 at 11:55 AM wish maple  >
> > > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Best,
> > > > > > Xuwei Fu
> > > > > >
> > > > > > David Li  于2024年8月6日周二 10:20写道:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > On Tue, Aug 6, 2024, at 10:17, Sutou Kouhei wrote:
> > > > > > > > +1 (binding)
> > > > > > > >
> > > > > > > > In  > > > > xbr9m_tfzz4-...@mail.gmail.com
> > > > > > >
> > > > > > > >   "[VOTE][Format] Bool8 Canonical Extension Type" on Mon, 5
> Aug
> > > > 2024
> > > > > > > > 08:59:42 -0400,
> > > > > > > >   Joel Lubinitsky  wrote:
> > > > > > > >
> > > > > > > >> Hello Devs,
> > > > > > > >>
> > > > > > > >> I would like to propose a new canonical extension type:
> Bool8
> > > > > > > >>
> > > > > > > >> The prior mailing list discussion thread can be found at
> [1].
> > > > > > > >> The format documentation change can be found at [2]. A copy
> > of the
> > > > > > text
> > > > > > > is
> > > > > > > >> included in this email.
> > > > > > > >> A Go implementation can be found at [3].
> > > > > > > >> A C++/Python implementation can be found at [4].
> > > > > > > >>
> > > > > > > >> Thank you for your time and attention in reviewing this
> > proposal.
> > > > > > > >>
> > > > > > > >> The vote will be open for at least 72 hours.
> > > > > > > >>
> > > > > > > >> [ ] +1 Accept this proposal
> > > > > > > >> [ ] +0
> > > > > > > >> [ ] -1 Do not accept this proposal because...
> > > > > > > >>
> > > > > > > >> [1]:
> > > > > https://lists.apache.org/thread/nz44qllq53h6kjl3rhy0531n2n2tpfr0
> > > > > > > >> [2]: https://github.com/apache/arrow/pull/43234
> > > > > > > >> [3]: https://github.com/apache/arrow/pull/43323
> > > > > > > >> [4]: https://github.com/apache/arrow/pull/43488
> > > > > > > >>
> > > > > > > >> ---
> > > > > > > >>
> > > > > > > >> 8-bit Boolean
> > > > > > > >> =
> > > > > > > >>
> > > > > > > >> Bool8 represents a boolean value using 1 byte (8 bits) to
> > store
> > > > each
> > > > > > > value
> > > > > > > >> instead of only 1 bit as in the original Arrow Boolean type.
> > > > > Although
> > > > > > > less
> > > > > > > >> compact than the original representation, Bool8 may have
> > better
> > > > > > > zero-copy
> > > > > > > >> compatibility with various systems that also store booleans
> > using
> > > > 1
> > > > > > > byte.
> > > > > > > >>
> > > > > > > >> * Extension name: ``arrow.bool8``.
> > > > > > > >>
> > > > > > > >> * The storage type of this extension is ``Int8`` where:
> > > > > > > >>
> > > > > > > >>   * **false** is denoted by the value ``0``.
> > > > > > > >>   * **true** can be specified using any non-zero value.
> > Preferably
> > > > > > > ``1``.
> > > > > > > >>
> > > > > > > >> * Extension type parameters:
> > > > > > > >>
> > > > > > > >>   This type does not have any parameters.
> > > > > > > >>
> > > > > > > >> * Description of the serialization:
> > > > > > > >>
> > > > > > > >>   No metadata is required to interpret the type. Any
> metadata
> > > > > present
> > > > > > > >> should be ignored.
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


Re: Arrow Maintainer Dashboard

2024-08-07 Thread Alenka Frim
Hi Raúl,

Thank you for your feedback! I love the idea and have created an
issue for it: https://github.com/arrow-maintenance/arrowdash/issues/25

I am not sure it is doable, but I will do some research.

Best,
Alenka

V V sre., 7. avg. 2024 ob 10:47 je oseba Raúl Cumplido 
napisala:

> Hi Alenka and Nic,
>
> This is great! Thanks for the effort.
>
> I would love for the top sections `Issues from New Contributors` or
> `Issues with No Replies` to be clickable so it filters the data
> displayed on the table. Same as for the PRs.
>
> Thank you very much for working on this!
>
> Regards,
> Raúl
>
> El mié, 7 ago 2024 a las 7:50, Alenka Frim ()
> escribió:
> >
> > Hello everyone!
> >
> > Nic Crane and I have been working on a Quarto dashboard [1] that would
> help
> > organise different sources of information when doing maintenance. The
> > dashboard currently pulls in the data from
> >
> > * GitHub (for the last 3 months)
> > * Stack Overflow (for the last 3 months)
> > * Mailing list (for the last month)
> >
> > and supports data from Python and R but can be easily extended
> > to include data from other implementations [2].
> >
> > We are sharing this through the mailing list in case others find the tool
> > useful
> > and also to gather any thoughts, feedback or suggestions.
> >
> > All comments are most welcome!
> >
> > We also plan to do a short demo in the upcoming Arrow Meetup in Seattle
> > on the 13th of August.
> >
> > All well,
> > Alenka
> >
> > [1]: https://arrow-maintenance.github.io/arrowdash
> > [2]: https://github.com/arrow-maintenance/arrowdash
>


Arrow Maintainer Dashboard

2024-08-06 Thread Alenka Frim
Hello everyone!

Nic Crane and I have been working on a Quarto dashboard [1] that would help
organise different sources of information when doing maintenance. The
dashboard currently pulls in the data from

* GitHub (for the last 3 months)
* Stack Overflow (for the last 3 months)
* Mailing list (for the last month)

and supports data from Python and R but can be easily extended
to include data from other implementations [2].

We are sharing this through the mailing list in case others find the tool
useful
and also to gather any thoughts, feedback or suggestions.

All comments are most welcome!

We also plan to do a short demo in the upcoming Arrow Meetup in Seattle
on the 13th of August.

All well,
Alenka

[1]: https://arrow-maintenance.github.io/arrowdash
[2]: https://github.com/arrow-maintenance/arrowdash


Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-17 Thread Alenka Frim
Thank you Joel for working on this! I have also came across
the need for a byte packed boolean support when implementing the
Python dataframe interchange protocol and also DPack which
is implemented in Arrow C++. The extension type is a great solution.

I will comment on the PR if I have any questions.

Alenka

V V sre., 17. jul. 2024 ob 23:32 je oseba Ian Cook 
napisala:

> Thanks Joel and Matt. This looks good to me.
>
> I think it's worth saying here that Arrow-producing components should still
> by default emit Booleans in the standard bit-packed Arrow layout. This
> proposed bool8 canonical extension type is intended to be used in
> applications where the producer knows that the consumer can correctly
> interpret the bool8 extension type and where using it is more efficient
> than converting the data to the standard bit-packed layout.
>
> Ian
>
> On Wed, Jul 17, 2024 at 5:19 PM Matt Topol  wrote:
>
> > Just chiming in that the libcudf documentation[1] states that this
> proposal
> > should work just fine. Bool8 type is described as "0 == false, else
> true".
> >
> > --Matt
> >
> > [1]:
> >
> >
> https://docs.rapids.ai/api/libcudf/stable/group__utility__types#gadf077607da617d1dadcc5417e2783539
> >
> > On Wed, Jul 17, 2024, 3:18 PM Joel Lubinitsky 
> wrote:
> >
> > > Thank you for your comments.
> > >
> > > I spent some time trying to confirm definitively that this proposal
> would
> > > enable zero copy sharing both ways between pyarrow and numpy. I put
> > > together the following gist [1] with my experiment.
> > >
> > > To summarize the results:
> > > - I was able to share the underlying value buffer both ways and have it
> > be
> > > interpreted correctly in each case.
> > > - Numpy will write 0 or 1 to the value buffer to indicate False or
> True.
> > > Importantly, numpy will also understand values outside this range to
> mean
> > > True without requiring a copy. This tracks closely with the proposed
> > > semantics.
> > >
> > > [1]: https://gist.github.com/joellubi/2ddf626633b57839cfd5f32cd94a7f3b
> > >
> > > On Wed, Jul 17, 2024 at 10:16 AM Ian Cook  wrote:
> > >
> > > > >> Before the vote, I would like to see verification that this truly
> > > > enables
> > > > >> zero-copy to/from NumPy bool arrays in Python.
> > > >
> > > > > I think this is an implementation issue more than a specification
> > > > issue...I am not personally worried about any provisions on the
> > > > specification that might make this impossible.
> > > >
> > > > To clarify, what I am looking for here is definite confirmation that
> > > > the proposed representation (in which a signed int8 zero value
> > indicates
> > > > False and any non-zero signed int8 value indicates True) corresponds
> to
> > > the
> > > > representation used by NumPy such that bidirectional zero-copy is
> made
> > > > possible. This seems to me like a specification issue.
> > > >
> > > > Ian
> > > >
> > > > On Wed, Jul 17, 2024 at 9:39 AM Dewey Dunnington
> > > >  wrote:
> > > >
> > > > > Thank you for this! I have definitely run across the
> > one-byte-per-item
> > > > > bool in numpy, DuckDB, and cudf. I haven't heard any discussion
> about
> > > > > DuckDB here but I am fairly sure that they represent their boolean
> > > > > type as an int8 as well [1].
> > > > >
> > > > > > Before the vote, I would like to see verification that this truly
> > > > enables
> > > > > > zero-copy to/from NumPy bool arrays in Python.
> > > > >
> > > > > I think this is an implementation issue more than a specification
> > > > > issue...I am not personally worried about any provisions on the
> > > > > specification that might make this impossible.
> > > > >
> > > > > -dewey
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://github.com/duckdb/duckdb/blob/85a82d86aa11a2695fc045deaf4f88fc63dd4fec/src/common/arrow/appender/bool_data.cpp#L28-L37
> > > > >
> > > > > On Tue, Jul 16, 2024 at 11:25 AM Antoine Pitrou <
> anto...@python.org>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > Hi Joel,
> > > > > >
> > > > > > This looks good to me on the principle. Can you split the spec
> and
> > > the
> > > > > > implementation(s) into separate PRs?
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > > > > >
> > > > > > Le 16/07/2024 à 13:18, Joel Lubinitsky a écrit :
> > > > > > > Hi Arrow devs,
> > > > > > >
> > > > > > > I'm working on adding an extension type for 8-bit booleans, and
> > > > wanted
> > > > > to
> > > > > > > start a discussion about it here because it could be valuable
> to
> > > > > others if
> > > > > > > adopted as a canonical extension type.
> > > > > > >
> > > > > > > The native implementation of the Boolean type uses 1 bit to
> > encode
> > > > each
> > > > > > > value, enabling a very compact representation. This is
> favorable
> > > for
> > > > > many
> > > > > > > workloads, but lots of systems that want to produce/consume
> > Boolean
> > > > > arrays
> > > > > > > use an 8-bit representation internally and are

Re: [VOTE] Migration of parquet-cpp issues to Arrow's issue tracker

2024-05-29 Thread Alenka Frim
+1 (non-binding)

Thank you Rok!

On Wed, May 29, 2024 at 4:57 PM Gang Wu  wrote:

> +1 (binding for Parquet)
>
> Thanks!
> Gang
>
> On Wed, May 29, 2024 at 10:47 PM Fokko Driesprong 
> wrote:
>
> > +1 (non-binding)
> >
> > Op wo 29 mei 2024 om 16:46 schreef Felipe Oliveira Carvalho <
> > felipe...@gmail.com>:
> >
> > > +1 (non-binding)
> > >
> > > On Wed, 29 May 2024 at 11:30 Micah Kornfield 
> > > wrote:
> > >
> > > > +1 (non-binding for Parquet, Binding for Arrow if that makes a
> > > difference)
> > > >
> > > >
> > > >
> > > > On Wed, May 29, 2024 at 7:15 AM Rok Mihevc 
> > wrote:
> > > >
> > > > > # sending this to both dev@arrow and dev@parquet
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Following the ML discussion [1] I would like to propose a vote for
> > > > > parquet-cpp issues to be moved from Parquet Jira [2] to Arrow's
> issue
> > > > > tracker [3].
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/zklp0lwcbcsdzgxoxy6wqjwrvt6y4s9p
> > > > > [2] https://issues.apache.org/jira/projects/PARQUET/issues/
> > > > > [3] https://github.com/apache/arrow/issues/
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Migrate parquet-cpp issues
> > > > > [ ] +0
> > > > > [ ] -1 Do not migrate parquet-cpp issues because...
> > > > >
> > > > >
> > > > > Rok
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Dane Pitkin

2024-05-07 Thread Alenka Frim
Yay, congratulations Dane!!

On Tue, May 7, 2024 at 4:00 PM Rok Mihevc  wrote:

> Congrats Dane!
>
> Rok
>
> On Tue, May 7, 2024 at 3:57 PM wish maple  wrote:
>
> > Congrats!
> >
> > Best,
> > Xuwei Fu
> >
> > Joris Van den Bossche  于2024年5月7日周二
> 21:53写道:
> >
> > > On behalf of the Arrow PMC, I'm happy to announce that Dane Pitkin has
> > > accepted an invitation to become a committer on Apache Arrow. Welcome,
> > > and thank you for your contributions!
> > >
> > > Joris
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore

2024-04-11 Thread Alenka Frim
Congratulations Sarah!

On Thu, Apr 11, 2024 at 6:21 PM Ruoxi Sun  wrote:

> Congrats!
>
> *Regards,*
> *Rossi SUN*
>
>
> Weston Pace  于2024年4月12日周五 00:13写道:
>
> > Congratulations!
> >
> > On Thu, Apr 11, 2024 at 9:12 AM wish maple 
> wrote:
> >
> > > Congrats!
> > >
> > > Best,
> > > Xuwei Fu
> > >
> > > Kevin Gurney  于2024年4月11日周四 23:22写道:
> > >
> > > > Congratulations, Sarah!! Well deserved!
> > > > 
> > > > From: Jacob Wujciak 
> > > > Sent: Thursday, April 11, 2024 11:14 AM
> > > > To: dev@arrow.apache.org 
> > > > Subject: Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore
> > > >
> > > > Congratulations and welcome!
> > > >
> > > > Am Do., 11. Apr. 2024 um 17:11 Uhr schrieb Raúl Cumplido <
> > > > rau...@apache.org
> > > > >:
> > > >
> > > > > Congratulations Sarah!
> > > > >
> > > > > El jue, 11 abr 2024 a las 13:13, Sutou Kouhei ( >)
> > > > > escribió:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On behalf of the Arrow PMC, I'm happy to announce that Sarah
> > > > > > Gilmore has accepted an invitation to become a committer on
> > > > > > Apache Arrow. Welcome, and thank you for your contributions!
> > > > > >
> > > > > > Thanks,
> > > > > > --
> > > > > > kou
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Bryce Mecum

2024-03-18 Thread Alenka Frim
Congratulations Bryce and thank you for all your contributions!!

On Mon, Mar 18, 2024 at 6:43 AM Raúl Cumplido 
wrote:

> Congratulations Bryce!!!
>
> El lun, 18 mar 2024, 5:21, Anja  escribió:
>
> > Congrats Bryce! =)
> >
> > On Sun, 17 Mar 2024 at 22:23, Nic Crane  wrote:
> >
> > > On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum has
> > > accepted an invitation to become a committer on Apache Arrow. Welcome,
> > and
> > > thank you for your contributions!
> > >
> > > Nic
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Jeffrey Vo

2024-02-11 Thread Alenka Frim
Congratulations Jeffrey!

On Tue, Feb 6, 2024 at 7:30 PM Raphael Taylor-Davies
 wrote:

> On behalf of the Arrow PMC, I am happy to announce that Jeffrey Vo has
> accepted an invitation to become a committer on Apache Arrow. Welcome,
> and thank you for your contributions!
>
> Raphael Taylor-Davies
>
>


Re: [ANNOUNCE] New Arrow committer: Felipe Oliveira Carvalho

2023-12-10 Thread Alenka Frim
Congratulations Felipe!!

On Fri, Dec 8, 2023 at 12:25 PM Felipe Oliveira Carvalho <
felipe...@gmail.com> wrote:

> Thank you everyone!
>
> --
> Felipe
> github.com/felipecrv
>
> On Thu, Dec 7, 2023 at 11:08 PM Rok Mihevc  wrote:
>
> > Congrats Felipe!
> >
> > On Fri, Dec 8, 2023 at 3:00 AM Gang Wu  wrote:
> >
> > > Congrats!
> > >
> > > On Fri, Dec 8, 2023 at 8:37 AM Dewey Dunnington
> > >  wrote:
> > >
> > > > Congrats!
> > > >
> > > > On Thu, Dec 7, 2023 at 4:28 PM Andrew Lamb 
> > wrote:
> > > > >
> > > > > Congratulations!
> > > > >
> > > > > On Thu, Dec 7, 2023 at 3:09 PM Kevin Gurney
> > > > 
> > > > > wrote:
> > > > >
> > > > > > Congratulations, Felipe!
> > > > > > 
> > > > > > From: Daniël Heres 
> > > > > > Sent: Thursday, December 7, 2023 2:59 PM
> > > > > > To: dev@arrow.apache.org 
> > > > > > Subject: Re: [ANNOUNCE] New Arrow committer: Felipe Oliveira
> > Carvalho
> > > > > >
> > > > > > Congrats!
> > > > > >
> > > > > > Op do 7 dec 2023 om 20:52 schreef Ben Harkins
> > > >  > > > > > >:
> > > > > >
> > > > > > > Congrats, Felipe!
> > > > > > >
> > > > > > > On Thu, Dec 7, 2023 at 2:00 PM Vibhatha Abeykoon <
> > > vibha...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Felipe.
> > > > > > > >
> > > > > > > > Vibhatha Abeykoon
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Dec 8, 2023 at 12:25 AM David Li <
> lidav...@apache.org>
> > > > wrote:
> > > > > > > >
> > > > > > > > > Congrats Felipe!
> > > > > > > > >
> > > > > > > > > On Thu, Dec 7, 2023, at 13:02, Raúl Cumplido wrote:
> > > > > > > > > > Congratulations Felipe!
> > > > > > > > > >
> > > > > > > > > > El jue, 7 dic 2023, 18:02, Dane Pitkin
> > > > > >  > > > > > > >
> > > > > > > > > > escribió:
> > > > > > > > > >
> > > > > > > > > >> Congrats, Felipe!
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Dec 7, 2023 at 11:41 AM hsseo0501 <
> > > > hsseo0...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Congrats. Felipe :)내 Galaxy에서 보냄
> > > > > > > > > >> >  원본 이메일 발신: Ian Cook <
> > ianmc...@apache.org
> > > >
> > > > 날짜:
> > > > > > > > > 23/12/8
> > > > > > > > > >> > 오전 1:24  (GMT+09:00) 받은 사람: dev@arrow.apache.org 제목:
> > Re:
> > > > > > > [ANNOUNCE]
> > > > > > > > > New
> > > > > > > > > >> > Arrow committer: Felipe Oliveira Carvalho
> > Congratulations
> > > > > > > > Felipe!!!On
> > > > > > > > > >> Thu,
> > > > > > > > > >> > Dec 7, 2023 at 10:43 AM Benjamin Kietzman <
> > > > bengil...@gmail.com>
> > > > > > > > > wrote:>>
> > > > > > > > > >> > On behalf of the Arrow PMC, I'm happy to announce that
> > > > Felipe
> > > > > > > > > Oliveira>
> > > > > > > > > >> > Carvalho> has accepted an invitation to become a
> > committer
> > > > on
> > > > > > > > Apache>
> > > > > > > > > >> > Arrow. Welcome, and thank you for your
> contributions!>>
> > > Ben
> > > > > > > Kietzman
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Daniël Heres
> > > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: James Duong

2023-11-16 Thread Alenka Frim
Congratulations!

On Thu, Nov 16, 2023 at 8:46 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> Congrats!
>
> On Thu, 16 Nov 2023 at 08:44, Sutou Kouhei  wrote:
> >
> > On behalf of the Arrow PMC, I'm happy to announce that James Duong
> > has accepted an invitation to become a committer on Apache
> > Arrow. Welcome, and thank you for your contributions!
> >
> > --
> > kou
> >
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Raúl Cumplido

2023-11-13 Thread Alenka Frim
Yay! Congratulations Raul!!!

On Tue, Nov 14, 2023 at 6:33 AM Vibhatha Abeykoon 
wrote:

> Congratulations Raúl !!!
>
> On Tue, Nov 14, 2023 at 10:54 AM wish maple 
> wrote:
>
> > Congrats Raul!
> >
> > Best,
> > Xuwei Fu
> >
> > Andrew Lamb  于2023年11月14日周二 03:28写道:
> >
> > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > Raúl Cumplido  to become a PMC member and we are pleased to announce
> > > that  Raúl Cumplido has accepted.
> > >
> > > Please join me in congratulating them.
> > >
> > > Andrew
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Curt Hagenlocher

2023-10-17 Thread Alenka Frim
Congrats and welcome Curt!

On Tue, Oct 17, 2023 at 3:06 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> Welcome to the team, Curt!
>
> On Mon, 16 Oct 2023 at 23:17, Curt Hagenlocher 
> wrote:
> >
> > Thanks, all!
> >
> > On Mon, Oct 16, 2023 at 9:19 AM Dane Pitkin  >
> > wrote:
> >
> > > Congrats Curt!
> > >
> > > On Mon, Oct 16, 2023 at 12:00 PM Kevin Gurney
> > > 
> > > wrote:
> > >
> > > > Congratulations, Curt!
> > > > 
> > > > From: Weston Pace 
> > > > Sent: Sunday, October 15, 2023 5:32 PM
> > > > To: dev@arrow.apache.org 
> > > > Subject: Re: [ANNOUNCE] New Arrow committer: Curt Hagenlocher
> > > >
> > > > Congratulations!
> > > >
> > > > On Sun, Oct 15, 2023, 8:51 AM Gang Wu  wrote:
> > > >
> > > > > Congrats!
> > > > >
> > > > > On Sun, Oct 15, 2023 at 10:49 PM David Li 
> wrote:
> > > > >
> > > > > > Congrats & welcome Curt!
> > > > > >
> > > > > > On Sun, Oct 15, 2023, at 09:03, wish maple wrote:
> > > > > > > Congratulations!
> > > > > > >
> > > > > > > Raúl Cumplido  于2023年10月15日周日 20:48写道:
> > > > > > >
> > > > > > >> Congratulations and welcome!
> > > > > > >>
> > > > > > >> El dom, 15 oct 2023, 13:57, Ian Cook 
> > > > escribió:
> > > > > > >>
> > > > > > >> > Congratulations Curt!
> > > > > > >> >
> > > > > > >> > On Sun, Oct 15, 2023 at 05:32 Andrew Lamb <
> al...@influxdata.com
> > > >
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > > On behalf of the Arrow PMC, I'm happy to announce that
> Curt
> > > > > > Hagenlocher
> > > > > > >> > > has accepted an invitation to become a committer on Apache
> > > > > > >> > > Arrow. Welcome, and thank you for your contributions!
> > > > > > >> > >
> > > > > > >> > > Andrew
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
>


Re: [ANNOUNCE] New Arrow PMC member: Jonathan Keane

2023-10-16 Thread Alenka Frim
Yay, congratulations Jon!!

On Mon, Oct 16, 2023 at 10:27 AM vin jake  wrote:

> Congrats Jon!
>
> On Sun, Oct 15, 2023 at 1:25 AM Andrew Lamb  wrote:
>
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Jonathan Keane to become a PMC member and we are pleased to announce
> > that Jonathan Keane has accepted.
> >
> > Congratulations and welcome!
> >
> > Andrew
> >
>


Re: quivr - a new library built on pyarrow

2023-10-09 Thread Alenka Frim
Hi Spencer,

Thank you for sharing!
>From a quick look, Quivr looks like an interesting project and it is great
to see
Arrow and Python bindings being used/extended in such a way.

You are definitely encouraged to work on PyArrow and on the features around
it. Any kind of contribution is very welcome. Do not hesitate to ping me on
Python
related issues in case you need a suggestion or a review.

Good luck,
Alenka


On Tue, Oct 3, 2023 at 11:44 PM Spencer Nelson  wrote:

> Hi all - I'd like to share a library I've been working on for a few months
> which is built on top of Arrow. It's called quivr
>  (like a bundle of arrows) and it could
> be thought of as tools to wrap up PyArrow Tables and extend their
> capabilities.
>
> I work on scientific software. A lot of the initial scientific work is done
> in Jupyter notebooks with dataframes. When it's time to build larger
> production systems on top of that work, the flexibility of dataframes
> becomes a liability. It's hard to write structured code because dataframes
> can be so variably typed and permissive.
>
> But if you try to use normal tools for this (Python objects, lists,
> dictionaries), you get crushed with performance issues. I wanted an
> array-oriented framework, but with a more structured model than any
> dataframe libraries out there.
>
> So, quivr fills that need. You write a *Table* definition, which
> corresponds closely to a pyarrow Table schema. You do that by writing a
> Python class, with class attributes signaling the types and names of your
> columns. And then you can attach methods to describe computation.
>
> By using Arrow's struct types, Tables can be composed. You might have a
> Table which defines a "Location" - and has sophisticated logic for that
> purpose - and reuse that Location within other, higher-order tables. The
> compositional approach has really been working extremely well so far in our
> work.
>
> I've written a little blog post
>  describing the
> motivations and showing it in use, and docs are up too
> . quivr is still in a pretty
> molten state, so I'm very interested in any feedback or broader interest in
> this from anyone who might find it useful. I'd love to work closer with the
> Arrow team as well - I have a growing wishlist of features around PyArrow
> which I'd be interested in working on.
>
> Thanks,
> Spencer
>


Re: [VOTE][Format] Variable shape tensor canonical extension type

2023-09-29 Thread Alenka Frim
+1
Thanks for pushing this through!

On Wed, Sep 27, 2023 at 2:44 PM Rok Mihevc  wrote:

> Hi all,
>
> Following the discussion [1][2] I would like to propose a vote to add
> variable shape tensor canonical extension type language to
> CanonicalExtensions.rst [3] as written below.
> A draft C++ implementation and a Python wrapper can be seen here [2].
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Accept this proposal
> [ ] +0
> [ ] -1 Do not accept this proposal because...
>
>
> [1] https://lists.apache.org/thread/qc9qho0fg5ph1dns4hjq56hp4tj7rk1k
> [2] https://github.com/apache/arrow/pull/37166
> [3]
>
> https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst
>
>
> Variable shape tensor
> =
>
> * Extension name: `arrow.variable_shape_tensor`.
>
> * The storage type of the extension is: ``StructArray`` where struct
>   is composed of **data** and **shape** fields describing a single
>   tensor per row:
>
>   * **data** is a ``List`` holding tensor elements of a single tensor.
> Data type of the list elements is uniform across the entire column.
>   * **shape** is a ``FixedSizeList[ndim]`` of the tensor shape
> where
> the size of the list ``ndim`` is equal to the number of dimensions of
> the
> tensor.
>
> * Extension type parameters:
>
>   * **value_type** = the Arrow data type of individual tensor elements.
>
>   Optional parameters describing the logical layout:
>
>   * **dim_names** = explicit names of tensor dimensions
> as an array. The length of it should be equal to the shape
> length and equal to the number of dimensions.
>
> ``dim_names`` can be used if the dimensions have well-known
> names and they map to the physical layout (row-major).
>
>   * **permutation**  = indices of the desired ordering of the
> original dimensions, defined as an array.
>
> The indices contain a permutation of the values [0, 1, .., N-1] where
> N is the number of dimensions. The permutation indicates which
> dimension of the logical layout corresponds to which dimension of the
> physical tensor (the i-th dimension of the logical view corresponds
> to the dimension with number ``permutations[i]`` of the physical
> tensor).
>
> Permutation can be useful in case the logical order of
> the tensor is a permutation of the physical order (row-major).
>
> When logical and physical layout are equal, the permutation will always
> be ([0, 1, .., N-1]) and can therefore be left out.
>
>   * **uniform_dimensions** = indices of dimensions whose sizes are
> guaranteed to remain constant. Indices are a subset of all possible
> dimension indices ([0, 1, .., N-1]).
> The uniform dimensions must still be represented in the ``shape``
> field,
> and must always be the same value for all tensors in the array -- this
> allows code to interpret the tensor correctly without accounting for
> uniform dimensions while still permitting optional optimizations that
> take advantage of the uniformity. ``uniform_dimensions`` can be left
> out,
> in which case it is assumed that all dimensions might be variable.
>
>   * **uniform_shape** = shape of the dimensions that are guaranteed to stay
> constant over all tensors in the array, with the shape of the ragged
> dimensions
> set to 0.
> An array containing a tensor with shape (2, 3, 4) and
> ``uniform_dimensions``
> (0, 2) would have ``uniform_shape`` (2, 0, 4).
>
> * Description of the serialization:
>
>   The metadata must be a valid JSON object, that optionally includes
>   dimension names with keys **"dim_names"**, ordering of
>   dimensions with key **"permutation"**, indices of dimensions whose sizes
>   are guaranteed to remain constant with key **"uniform_dimensions"** and
>   shape of those dimensions with key **"uniform_shape"**.
>   Minimal metadata is an empty JSON object.
>
>   - Example of minimal metadata is:
>
> ``{}``
>
>   - Example with ``dim_names`` metadata for NCHW ordered data:
>
> ``{ "dim_names": ["C", "H", "W"] }``
>
>   - Example with ``uniform_dimensions`` metadata for a set of color images
> with variable width:
>
> ``{ "dim_names": ["H", "W", "C"], "uniform_dimensions": [1] }``
>
>   - Example of permuted 3-dimensional tensor:
>
> ``{ "permutation": [2, 0, 1] }``
>
> This is the physical layout shape and the shape of the logical
> layout given an individual tensor of shape [100, 200, 500] would
> be ``[500, 100, 200]``.
>
> .. note::
>
>   With the exception of permutation all other parameters and storage
>   of VariableShapeTensor define the *physical* storage of the tensor.
>
>   For example, consider a tensor with:
> shape = [10, 20, 30]
> dim_names = [x, y, z]
> permutations = [2, 0, 1]
>
>   This means the logical tensor has names [z, x, y] and shape [30, 10, 20].
>
>   Elements in a variable shape tensor extension array are stored
>   in row-major/C

Re: [DISCUSS] Proposal to add VariableShapeTensor Canonical Extension Type

2023-09-12 Thread Alenka Frim
Hi all,

Thank you Rok for all your valuable work on the Arrow tensors!
I think the proposed spec and implementation are good and I have no
comments on that.

In the PR you mention that "this [ragged dimensions] would be purely
metadata that would help converting arrow <-> jagged/ragged". Are there any
examples available to better understand this metadata and how it would be
used in the conversion you mention?

Thanks!
Alenka

On Wed, Sep 13, 2023 at 2:38 AM Rok Mihevc  wrote:

> After some discussion on the PR [
> https://github.com/apache/arrow/pull/37166]
> we've altered the proposed type by removing the ndim parameter and
> adding ragged_dimensions one.
> If there is no further feedback I'd like to call for a vote early next
> week. Proposed language now reads:
>
> Variable shape tensor
> =
>
> * Extension name: `arrow.variable_shape_tensor`.
>
> * The storage type of the extension is: ``StructArray`` where struct
>   is composed of **data** and **shape** fields describing a single
>   tensor per row:
>
>   * **data** is a ``List`` holding tensor elements of a single tensor.
> Data type of the list elements is uniform across the entire column
> and also provided in metadata.
>   * **shape** is a ``FixedSizeList[ndim]`` of the tensor shape
> where
> the size of the list ``ndim`` is equal to the number of dimensions of
> the
> tensor.
>
> * Extension type parameters:
>
>   * **value_type** = the Arrow data type of individual tensor elements.
>
>   Optional parameters describing the logical layout:
>
>   * **dim_names** = explicit names to tensor dimensions
> as an array. The length of it should be equal to the shape
> length and equal to the number of dimensions.
>
> ``dim_names`` can be used if the dimensions have well-known
> names and they map to the physical layout (row-major).
>
>   * **permutation**  = indices of the desired ordering of the
> original dimensions, defined as an array.
>
> The indices contain a permutation of the values [0, 1, .., N-1] where
> N is the number of dimensions. The permutation indicates which
> dimension of the logical layout corresponds to which dimension of the
> physical tensor (the i-th dimension of the logical view corresponds
> to the dimension with number ``permutations[i]`` of the physical
> tensor).
>
> Permutation can be useful in case the logical order of
> the tensor is a permutation of the physical order (row-major).
>
> When logical and physical layout are equal, the permutation will always
> be ([0, 1, .., N-1]) and can therefore be left out.
>
>   * **ragged_dimensions** = indices of ragged dimensions whose sizes may
> differ. Dimensions where all elements have the same size are called
> uniform dimensions. Indices are a subset of all possible dimension
> indices ([0, 1, .., N-1]).
> Ragged dimensions list can be left out. In that case all dimensions
> are assumed ragged.
>
> * Description of the serialization:
>
>   The metadata must be a valid JSON object including number of
>   dimensions of the contained tensors as an integer with key **"ndim"**
>   plus optional dimension names with keys **"dim_names"** and ordering of
>   the dimensions with key **"permutation"**.
>
>   - Example with ``dim_names`` metadata for NCHW ordered data:
>
> ``{ "dim_names": ["C", "H", "W"] }``
>
>   - Example with ``ragged_dimensions`` metadata for a set of color images
> with variable width:
>
> ``{ "dim_names": ["H", "W", "C"], "ragged_dimensions": [1] }``
>
>   - Example of permuted 3-dimensional tensor:
>
> ``{ "permutation": [2, 0, 1] }``
>
> This is the physical layout shape and the shape of the logical
> layout would given an individual tensor of shape [100, 200, 500]
> be ``[500, 100, 200]``.
>
> .. note::
>
>   Elements in a variable shape tensor extension array are stored
>   in row-major/C-contiguous order.
>
>
> Rok
>


Re: [ACCOUNCE] New Arrow Committer: Metehan Yildirim

2023-09-05 Thread Alenka Frim
Congratulations Metehan!

On Wed, Sep 6, 2023 at 2:21 AM Ian Joiner  wrote:

> Congratulations!
>
> On Tue, Sep 5, 2023 at 12:14 PM Andrew Lamb  wrote:
>
> > Belatedly,
> >
> > On behalf of the Arrow PMC, I'm happy to announce that Metehan Yildirim
> > (mete[1])
> > has accepted an invitation to become a committer on Apache
> > Arrow. Welcome, and thank you for your contributions!
> >
> > Andrew
> >
> > [1]: https://people.apache.org/phonebook.html?uid=mete
> >
>


Re: [ANNOUNCE] New Arrow committer: Kevin Gurney

2023-07-04 Thread Alenka Frim
Congratulations!

On Tue, Jul 4, 2023 at 9:41 PM Dewey Dunnington
 wrote:

> Congrats!
>
> On Tue, Jul 4, 2023 at 2:08 PM Matt Topol  wrote:
> >
> > Welcome!
> >
> > On Tue, Jul 4, 2023, 11:06 AM Joris Van den Bossche <
> > jorisvandenboss...@gmail.com> wrote:
> >
> > > Congrats Kevin!
> > >
> > > On Tue, 4 Jul 2023 at 13:47, David Li  wrote:
> > > >
> > > > Welcome Kevin!
> > > >
> > > > On Tue, Jul 4, 2023, at 05:55, Raúl Cumplido wrote:
> > > > > Congratulations Kevin!!!
> > > > >
> > > > > El mar, 4 jul 2023 a las 3:32, Weston Pace ( >)
> > > escribió:
> > > > >>
> > > > >> Congratulations Kevin!
> > > > >>
> > > > >> On Mon, Jul 3, 2023 at 5:18 PM Sutou Kouhei 
> > > wrote:
> > > > >>
> > > > >> > On behalf of the Arrow PMC, I'm happy to announce that Kevin
> Gurney
> > > > >> > has accepted an invitation to become a committer on Apache
> > > > >> > Arrow. Welcome, and thank you for your contributions!
> > > > >> >
> > > > >> > --
> > > > >> > kou
> > > > >> >
> > >
>


Re: Do we need CODEOWNERS ?

2023-07-04 Thread Alenka Frim
I agree with what was said till now.

I did agree to be added as a codeowner for the Python directory which didn't
turn out to be the best idea. As Joris mentioned, the number of
notifications
is not small. There are lots of PRs that are not Python related, but maybe
just have a test added in Python and therefore I am not capable of
reviewing.
So similarly as Dewey mentioned, most of the PRs on which I get assigned
to as a reviewer I simply ignore.

Not perfect, for sure. Hopefully that didn't in reality cause too much
bewilderment and bad experience from the side of the contributors.

But what I do like with this approach is that I am aware of most of the
things
that go on in the project and could be connected to pyarrow.

To give a "vote" on the proposed way forward, I think the second option
(de-assigning themselves, and if possible pinging another core developer)
could be a good way to go. If we would be expected to give a review on each
PR we are assigned to it would be fair that I remove myself from the
CODEOWNERS file.

Best,
Alenka

On Wed, Jul 5, 2023 at 12:05 AM Will Jones  wrote:

> I haven't had as much time to review the Parquet PRs, so I'll remove myself
> from the CODEOWNERS for that.
>
> I've found that I have a much easier time keeping up with PR reviews in
> projects that are smaller, even if there are proportionally fewer
> maintainers. I think that's the piece that appealed to me originally about
> CODEOWNERS: that we could start to make there be some more clarity on how
> reviewing responsibility can be divided up. But I agree it hasn't really
> lived up to that hope.
>
> On Tue, Jul 4, 2023 at 1:13 PM Joris Van den Bossche <
> jorisvandenboss...@gmail.com> wrote:
>
> > I think it can be useful in certain cases, where the selection is
> > specific enough (for example if all Go related PRs is not too much for
> > Matt, this features sounds useful for him. I can also imagine if you
> > are working on flight, just getting notifications for changes to the
> > flight-related files might be useful).
> >
> > Personally, for myself I didn't add my name to the CODEOWNERS, because
> > as someone doing general pyarrow maintenance, I was thinking that
> > adding my name as owner of "python" directory would lead to way too
> > many notifications for me, and there is no obvious more specific
> > selection.
> >
> > So if it's useful for some people, I wouldn't necessarily remove it,
> > as long as: 1) everyone individually evaluates for themselves whether
> > this is working or not (and it's fine to remove some entries again),
> > and 2) we know this is not a system to properly ping reviewers for all
> > PRs, and we still need to manually ping reviewers in other cases.
> >
> > On Tue, 4 Jul 2023 at 20:11, Matt Topol  wrote:
> > >
> > > I've found it useful for me so far since it auto adds me on any Go
> > related
> > > PRs so I don't need to sift through the notifications or active PRs,
> and
> > > instead can easily find them in my reviews on GitHub notifications.
> > >
> > > But if everyone else finds it more detrimental than helpful I can set
> up
> > a
> > > custom filter or something.
> > >
> > > On Tue, Jul 4, 2023, 12:30 PM Weston Pace 
> wrote:
> > >
> > > > I agree the experiment isn't working very well.  I've been meaning to
> > > > change my listing from `compute` to `acero` for a while.  I'd be +1
> for
> > > > just removing it though.
> > > >
> > > > On Tue, Jul 4, 2023, 6:44 AM Dewey Dunnington
> > > > 
> > > > wrote:
> > > >
> > > > > Just a note that for me, the main problem is that I get automatic
> > > > > review requests for PRs that have nothing to do with R (I think
> this
> > > > > happens when a rebase occurs that contained an R commit). Because
> > that
> > > > > happens a lot, it means I miss actual review requests and sometimes
> > > > > mentions because they blend in. I think CODEOWNERS results in me
> > > > > reviewing more PRs than if I had to set up some kind of custom
> > > > > notification filter but I agree that it's not perfect.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > -dewey
> > > > >
> > > > > On Tue, Jul 4, 2023 at 10:04 AM Antoine Pitrou  >
> > > > wrote:
> > > > > >
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Some time ago we added a `.github/CODEOWNERS` file in the main
> > Arrow
> > > > > > repo. The idea is that, when specific files or directories are
> > touched
> > > > > > by a PR, specific people are asked for review.
> > > > > >
> > > > > > Unfortunately, it seems that, most of the time, this produces the
> > > > > > following effects:
> > > > > >
> > > > > > 1) the people who are automatically queried for review don't show
> > up
> > > > > > (perhaps they simply ignore those automatic notifications)
> > > > > > 2) when several people are assigned for review, each designated
> > > > reviewer
> > > > > > seems to hope that the other ones will be doing the work, instead
> > of
> > > > > > doing it themselves
> > > > > > 3) contributors exp

Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington

2023-06-23 Thread Alenka Frim
Congratulations Dewey!! 🎉

On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido 
wrote:

> Congratulations Dewey!
>
> El vie, 23 jun 2023, 11:55, Andrew Lamb  escribió:
>
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Dewey Dunnington (paleolimbot) to become a PMC member and we are pleased
> to
> > announce
> > that Dewey Dunnington has accepted.
> >
> > Congratulations and welcome!
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Ben Baumgold,

2023-06-20 Thread Alenka Frim
Congratulations Ben!

On Tue, Jun 20, 2023 at 1:54 PM David Li  wrote:

> Welcome Ben!
>
> On Tue, Jun 20, 2023, at 06:14, Andrew Lamb wrote:
> > The Project Management Committee (PMC) for Apache Arrow has invited
> > Ben Baumgold, to become a PMC member and we are pleased to announce
> > that Ben Baumgold has accepted.
> >
> > Congratulations and welcome!
>


Re: Regarding issue with pyarrow

2023-06-04 Thread Alenka Frim
Hi Hari,

Could you provide more information (I am not familiar with awswrangler but
I guess pyarrow is required?) - where do you get this error, how did you
install pyarrow, etc.?

Best, Alenka

On Sat, Jun 3, 2023 at 11:45 PM Palla Harikrishna
 wrote:

> Hi,
>
> I am getting issues with Pyarrow when I am using the awswrangler package
> and the issue is:   "errorMessage": "Unable to import module "test": No
> module named 'pyarrow.lib'",
>
> Can you please help me with this?
>
> Thanks,
> Hari
>
>
>
> Disclaimer: This email (including any attachments) contains information,
> which is confidential and may be subject to legal privilege. If you are not
> the intended recipient, you must not use, distribute, or copy this email.
> If you have received this email in error, please notify the sender
> immediately and delete this. Any views expressed in this mail are not
> necessarily the views of Indegene. Thank you.
>


Re: Arrow community meeting April 26 at 16:00 UTC

2023-05-04 Thread Alenka Frim
Hi all,

I just wanted to chime in with the tensor discussion happening last week on
the Arrow
community meeting call.

Questions about usage of the new fixed-shape tensor canonical extension
> type [6]
> - Can it be written to a Parquet file and read back in? If so, what
> Parquet logical and physical types does it use?
>

With Arrow (PyArrow in the example) it can be written to a Parquet file and
bak in, the
type used in Parquet seems to be a List:

import pyarrow as pa

tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
storage = pa.array(arr, pa.list_(pa.int32(), 4))
tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)

data = [
pa.array([1, 2, 3]),
pa.array(['foo', 'bar', None]),
pa.array([True, None, True]),
tensor_array,
]

my_schema = pa.schema([('f0', pa.int8()),
('f1', pa.string()),
('f2', pa.bool_()),
('tensors_int', tensor_type)])

table = pa.Table.from_arrays(data, schema=my_schema)

import pyarrow.parquet as pq
pq.write_table(table, 'example_tensor.parquet')

pq.read_table('example_tensor.parquet')
# pyarrow.Table
# f0: int8
# f1: string
# f2: bool
# tensors_int: extension
# 
# f0: [[1,2,3]]
# f1: [["foo","bar",null]]
# f2: [[true,null,true]]
# tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]

pq.read_metadata('example_tensor.parquet')
# 
# created_by: parquet-cpp-arrow version 12.0.0-SNAPSHOT
# num_columns: 4
# num_rows: 3
# num_row_groups: 1
# format_version: 2.6
# serialized_size: 1164
pq.read_metadata('example_tensor.parquet').schema
# 
# required group field_id=-1 schema {
# optional int32 field_id=-1 f0 (Int(bitWidth=8, isSigned=true));
# optional binary field_id=-1 f1 (String);
# optional boolean field_id=-1 f2;
# optional group field_id=-1 tensors_int (List) {
# repeated group field_id=-1 list {
# optional int32 field_id=-1 item;
# }
# }
# }



> - Is it recommended for use with image data, or should we use byte
> arrays instead?


That depends on the use case. With a fixed-shape tensor you can access
individual image
data (pixels). Byte arrays will probably perform better when reading and
writing to a Parquet
file (avoiding repetitions, not tested though) but will also need some
custom logic to get
individual image data if needed.

Hope this information helps.

Best,
Alenka

On Thu, Apr 27, 2023 at 10:47 PM Ian Cook  wrote:

> Below is a summary of the notes from yesterday's meeting:
>
> Attendees:
>
> - Ian Cook
> - Raúl Cumplido
> - Xuwei Fu
> - Will Jones
> - Bryce Mecum
> - Rok Mihevc
> - Sri Nadukudy
> - Matthew Topol
>
>
> Discussion:
>
> Arrow 12.0.0 release
> - RC0 has been proposed [1]
> - There were a lot of CI failures at the time of the code freeze so it
> took longer than usual to resolve these and generate RC0; thanks to
> everyone who helped
> - There is one outstanding question regarding an issue with pandas
> 2.0.1 [2] and there is a fix that skips the failing test [3]
> - It is unclear whether we should create a new RC that skips this
> test, or whether it is sufficient to release the current RC since
> pandas will fix the issue on their end
> - There are a couple of other minor issues that we don’t think are blockers
>
>
> Support for non-CPU memory in Arrow C data interface [4][5]
> - We are seeking input that addresses the questions posed and gives
> concrete recommendations
>
>
> Questions about usage of the new fixed-shape tensor canonical extension
> type [6]
> - Can it be written to a Parquet file and read back in? If so, what
> Parquet logical and physical types does it use?
> - Is it recommended for use with image data, or should we use byte
> arrays instead?
>
>
> Status of proposed integration tests for C data interface [7]
> - Has not yet been implemented
>
>
> Suggested topics for next meeting
> - Discuss priorities for Arrow 13.0.0 release
>
>
> [1] https://lists.apache.org/thread/2cnl1nbr8kfcxxq9s9br9b6f4xpmsqz1
> [2] https://github.com/pandas-dev/pandas/issues/52899
> [3] https://github.com/apache/arrow/pull/35324
> [4] https://github.com/apache/arrow/pull/34972
> [5] https://lists.apache.org/thread/sntc3pp6msdvb94zhq2lvy70s1p6d1qg
> [6]
> https://arrow.apache.org/docs/dev/format/CanonicalExtensions.html#official-list
> [7] https://lists.apache.org/thread/nr05xwls713xpsxkobpln2f6wsdntrky
>
>
> On Tue, Apr 25, 2023 at 3:54 PM Ian Cook  wrote:
> >
> > Hi all,
> >
> > Our biweekly Arrow community meeting is tomorrow at 16:00 UTC / 12:00
> EDT.
> >
> > Zoom meeting URL:
> > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> > Meeting ID: 876 4903 3008
> > Passcode: 958092
> >
> > The notes for this and future instances of this meeting will be
> > captured in this Google Doc:
> >
> https://docs.google.com/document/d/1xrji8fc6_24TVmKiHJB4ECX1Zy2sy2eRbBjpVJMnPmk/
> > If you plan to attend this meeting, you are welcome to edit the
> > document to add the topics that you would like to discuss.
> >
> > Thanks,
> > Ian
>


Re: [ANNOUNCE] New Arrow PMC member: Matt Topol

2023-05-04 Thread Alenka Frim
Congratulations Matt!!

On Thu, May 4, 2023 at 9:22 AM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> Congrats Matt!
>
> On Thu, 4 May 2023 at 06:31, Nic Crane  wrote:
> >
> > Congratulations!
> >
> > On Thu, 4 May 2023, 05:24 Vibhatha Abeykoon,  wrote:
> >
> > > Congratulations Matt!
> > >
> > > On Thu, May 4, 2023 at 7:35 AM Ian Cook  wrote:
> > >
> > > > Congratulations Matt!!!
> > > >
> > > > On Wed, May 3, 2023 at 9:55 PM Yibo Cai  wrote:
> > > > >
> > > > > Congrats Matt!
> > > > >
> > > > > On 5/4/23 07:07, Krisztián Szűcs wrote:
> > > > > > Congrats Matt!
> > > > > >
> > > > > > On Wed, May 3, 2023 at 11:44 PM Rok Mihevc  >
> > > > wrote:
> > > > > >>
> > > > > >> Congrats Matt. Well deserved!
> > > > > >>
> > > > > >> Rok
> > > > > >>
> > > > > >> On Wed, May 3, 2023 at 11:03 PM David Li 
> > > wrote:
> > > > > >>
> > > > > >>> Congrats Matt!
> > > > > >>>
> > > > > >>> On Wed, May 3, 2023, at 16:06, Neal Richardson wrote:
> > > > >  Congratulations!
> > > > > 
> > > > >  On Wed, May 3, 2023 at 1:58 PM Jacob Wujciak
> > > > > >>> 
> > > > >  wrote:
> > > > > 
> > > > > > Congratulations, well deserved!
> > > > > >
> > > > > > On Wed, May 3, 2023 at 7:48 PM Weston Pace <
> > > weston.p...@gmail.com>
> > > > > >>> wrote:
> > > > > >
> > > > > >> Congratulations!
> > > > > >>
> > > > > >> On Wed, May 3, 2023 at 10:47 AM Raúl Cumplido <
> > > > raulcumpl...@gmail.com
> > > > > 
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Congratulations Matt!
> > > > > >>>
> > > > > >>> El mié, 3 may 2023, 19:44, vin jake 
> > > > > >>> escribió:
> > > > > >>>
> > > > >  Congratulations, Matt!
> > > > > 
> > > > >  Felipe Oliveira Carvalho  于
> 2023年5月4日周四
> > > > > > 01:42写道:
> > > > > 
> > > > > > Congratulations, Matt!
> > > > > >
> > > > > > On Wed, 3 May 2023 at 14:37 Andrew Lamb <
> > > al...@influxdata.com>
> > > > > >> wrote:
> > > > > >
> > > > > >> The Project Management Committee (PMC) for Apache Arrow
> has
> > > > > > invited
> > > > > >> Matt Topol (zeroshade) to become a PMC member and we are
> > > > > >>> pleased
> > > > > > to
> > > > > >> announce
> > > > > >> that Matt has accepted.
> > > > > >>
> > > > > >> Congratulations and welcome!
> > > > > >>
> > > > > >
> > > > > 
> > > > > >>>
> > > > > >>
> > > > > >
> > > > > >>>
> > > >
> > >
>


Re: [Python] Casting struct to map

2023-05-03 Thread Alenka Frim
Hi Alex,

passing the schema to from_pylist() method on the Table should work for
your example (not sure if it solves your initial problem?)

import pyarrow as pa

table_schema = pa.schema([pa.field("id", pa.int32()),
pa.field("names", pa.map_(pa.string(), pa.string()))])

table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":"Brady"
}},
{"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]

pa.Table.from_pylist(table_data, schema=table_schema)
# pyarrow.Table
# id: int32
# names: map
# child 0, entries: struct not null
# child 0, key: string not null
# child 1, value: string
# 
# id: [[1,2]]
# names:
[[keys:["first_name","last_name"]values:["Tyler","Brady"],keys:["first_name","last_name"]values:["Walsh","Weaver"]]]


Best, Alenka

On Wed, May 3, 2023 at 9:13 AM Jerald Alex  wrote:

> Any inputs on this please?
>
> On Tue, May 2, 2023 at 10:03 AM Jerald Alex  wrote:
>
> > Hi Experts,
> >
> > Can anyone please highlight if it is possible to cast struct to map type?
> >
> > I tried the following but it seems to  be producing an error as below.
> >
> > pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> > struct to map using function
> cast_map
> >
> > Note: Snippet is just an example to show the problem.
> >
> > Code Snippet:
> >
> > table_schema = pa.schema([pa.field("id", pa.int32()), pa.field("names",
> > pa.map_(pa.string(), pa.string()))])
> >
> > table_data = [{"id": 1,"names": {"first_name": "Tyler", "last_name":
> > "Brady"}},
> > {"id": 2,"names": {"first_name": "Walsh", "last_name": "Weaver"}}]
> >
> > tbl = pa.Table.from_pylist(table_data)
> > print(tbl)
> > print(tbl.cast(table_schema))
> > print(tbl)
> >
> > Error :
> >
> > id: int64
> > names: struct
> >   child 0, first_name: string
> >   child 1, last_name: string
> > 
> > id: [[1,2]]
> > names: [
> >   -- is_valid: all not null
> >   -- child 0 type: string
> > ["Tyler","Walsh"]
> >   -- child 1 type: string
> > ["Brady","Weaver"]]
> > Traceback (most recent call last):
> >   File "/Users/
> >
> infant.a...@cognitedata.com/Documents/Github/HubOcean/demo/pyarrow_types.py
> ",
> > line 220, in 
> > print(tbl.cast(table_schema))
> >   File "pyarrow/table.pxi", line 3489, in pyarrow.lib.Table.cast
> >   File "pyarrow/table.pxi", line 523, in pyarrow.lib.ChunkedArray.cast
> >   File "/Users/
> >
> infant.a...@cognitedata.com/Library/Caches/pypoetry/virtualenvs/demo-LzMA3Hsd-py3.10/lib/python3.10/site-packages/pyarrow/compute.py
> ",
> > line 391, in cast
> > return call_function("cast", [arr], options)
> >   File "pyarrow/_compute.pyx", line 560, in
> pyarrow._compute.call_function
> >   File "pyarrow/_compute.pyx", line 355, in
> pyarrow._compute.Function.call
> >   File "pyarrow/error.pxi", line 144, in
> > pyarrow.lib.pyarrow_internal_check_status
> >   File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
> > pyarrow.lib.ArrowNotImplementedError: Unsupported cast from
> > struct to map using function
> cast_map
> >
> > Regards,
> > Alex Vincent
> >
>


Re: [ANNOUNCE] New Arrow committer: Mustafa Akur

2023-04-03 Thread Alenka Frim
Congrats Mustafa!

On Sat, Apr 1, 2023 at 2:59 AM Jacob Wujciak 
wrote:

> Congrats!
>
> On Sat, Apr 1, 2023 at 12:42 AM Raúl Cumplido 
> wrote:
>
> > Congratulations Mustafa!
> >
> > El vie, 31 mar 2023, 22:01, Rok Mihevc  escribió:
> >
> > > Congrats!
> > >
> > > Rok
> > >
> > > On Fri, Mar 31, 2023 at 9:42 PM Mehmet Ozan Kabak 
> > wrote:
> > >
> > > > Congrats Mustafa! You are a great team member at Synnada and I’m sure
> > you
> > > > will be a valued member of the Apache Arrow community too.
> > > >
> > > > > On Mar 31, 2023, at 10:54 AM, Matthew Topol
> > > 
> > > > wrote:
> > > > >
> > > > > Congrats Mustafa! Welcome!
> > > > >
> > > > > On Fri, Mar 31, 2023 at 9:24 AM David Li 
> > wrote:
> > > > >
> > > > >> Congrats & welcome Mustafa!
> > > > >>
> > > > >> On Fri, Mar 31, 2023, at 06:21, Andrew Lamb wrote:
> > > > >>> Hello Arrow Community
> > > > >>>
> > > > >>> On behalf of the Arrow PMC, I'm happy to announce that Mustafa
> Akur
> > > > >>> has accepted an invitation to become a committer on Apache
> > > > >>> Arrow. Welcome, and thank you for your contributions!
> > > > >>>
> > > > >>> Andrew
> > > > >>
> > > >
> > > >
> > >
> >
>


Re: [VOTE][Format] Fixed shape tensor Canonical Extension Type

2023-03-15 Thread Alenka Frim
Hi all,

Thanks Rok for your view on the Parquet topic.

Thank you all for joining in on the discussion and the designing of
the spec for the support of fixed shape tensors in Apache Arrow!

With 3 binding +1 votes, 2 non-binding +1 votes, and no -1 vote, the
vote has passed.

The PR with the specification is ready [1] so I will merge it
later today.

The next step is the C++ implementation. The PR [2] is already
in the final stages of the review process.

[1]: https://github.com/apache/arrow/pull/33925
[2]: https://github.com/apache/arrow/pull/8510/files

All well,
Alenka

On Mon, Mar 6, 2023 at 1:41 PM Alenka Frim  wrote:

> Hi all,
>
> I am starting a new voting thread with this email as the first voting
> thread [1] opened up new
> comments and suggestions and we wanted to take time to see how
> that evolves.
>
> *I would like to propose we vote on adding the fixed shape tensor
> canonical extension type*
> *with the following specification:*
>
> Fixed shape tensor
> ==
>
> * Extension name: `arrow.fixed_shape_tensor`.
>
> * The storage type of the extension: ``FixedSizeList`` where:
>
>   * **value_type** is the data type of individual tensor elements.
>   * **list_size** is the product of all the elements in tensor shape.
>
> * Extension type parameters:
>
>   * **value_type** = the Arrow data type of individual tensor elements.
>   * **shape** = the physical shape of the contained tensors
> as an array.
>
>   Optional parameters describing the logical layout:
>
>   * **dim_names** = explicit names to tensor dimensions
> as an array. The length of it should be equal to the shape
> length and equal to the number of dimensions.
>
> ``dim_names`` can be used if the dimensions have well-known
> names and they map to the physical layout (row-major).
>
>   * **permutation**  = indices of the desired ordering of the
> original dimensions, defined as an array.
>
> The indices contain a permutation of the values [0, 1, .., N-1] where
> N is the number of dimensions. The permutation indicates which
> dimension of the logical layout corresponds to which dimension of the
> physical tensor (the i-th dimension of the logical view corresponds
> to the dimension with number ``permutations[i]`` of the physical tensor).
>
> Permutation can be useful in case the logical order of
> the tensor is a permutation of the physical order (row-major).
>
> When logical and physical layout are equal, the permutation will always
> be ([0, 1, .., N-1]) and can therefore be left out.
>
> * Description of the serialization:
>
>   The metadata must be a valid JSON object including shape of
>   the contained tensors as an array with key **"shape"** plus optional
>   dimension names with keys **"dim_names"** and ordering of the
>   dimensions with key **"permutation"**.
>
>   - Example: ``{ "shape": [2, 5]}``
>   - Example with ``dim_names`` metadata for NCHW ordered data:
>
> ``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}``
>
>   - Example of permuted 3-dimensional tensor:
>
> ``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``
>
> This is the physical layout shape and the the shape of the logical
> layout would in this case be ``[500, 100, 200]``.
>
> .. note::
>
>   Elements in a fixed shape tensor extension array are stored
>   in row-major/C-contiguous order.
>
> * The specification is submitted as a PR [2] to Canonical Extension Types
> document under the
>format specifications directory [3].
>
> There are also two implementations submitted to Apache Arrow repository:
> * C++ implementation of the proposed specification [4]
> * Python example implementation of the proposed specification and usage
> (only illustrative) [5]
>
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Accept this proposal
> [ ] +0
> [ ] -1 Do not accept this proposal because...
>
>
> Regards, Alenka
>
> [1]: https://lists.apache.org/thread/3cj0cr44hg3t2rn0kxly8td82yfob1nd
> [2]: https://github.com/apache/arrow/pull/33925/files
> [3]:
> https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst
>
> [4]: https://github.com/apache/arrow/pull/8510/files
> [5]: https://github.com/apache/arrow/pull/33948/files
>


Re: [VOTE][Format] Fixed shape tensor Canonical Extension Type

2023-03-15 Thread Alenka Frim
Thank you for the clarification Adam.

All the observations and conclusions you have made are very valuable.
Certainly, the fact that Parquet reading is not as fast as it could be is a
consequence of choosing a FixedSizeList type as a storage type for
the fixed shape tensor extension.

Despite that, we (Joris and I) do not think that it is enough reason to
change
the specification. Flat doubles, for example, are not an actual alternative
but
something a user could potentially do on the side of the application to get
a
faster write/read.

Also we think there is no need to add restrictions on the presence of null
values and should be up to the application to add (in many cases tensor
data will not have nulls, since in general array libraries don't support
those).

I do hope future use of this extension type will not be annoying as you
have previously experienced with timestamp and timezone conversions.

As you stated this is not a blocking issue I will wait for later today to
finish this vote and proceed with the spec merge so we can get some
more attention to the C++ implementation.

Best,
Alenka

On Mon, Mar 13, 2023 at 2:28 PM Adam Lippai  wrote:

> Hi Alenka,
>
> We didn’t discuss or benchmark the alternative formats. My understanding is
> that the best should be similar to an primitive double Arrow column.
> Currently the parquet (de)serialization takes 3x longer than desired for
> the new Tensor type. That sounds more than “chasing the last 20% of
> performance”.
>
> The conversation can be continued separately, the most pressing questions
> or issues are:
> 1. We might want to specify that a tensor has to consist of fixed size,
> non-null and non-nested items to avoid confusion. This is a big constraint,
> however makes it easier to have consistent assumptions and optimize eg the
> parquet storage later. Alternatively we can define a DoubleTensor later (or
> just accept the behavior varies a lot depending on the stored data, even
> int and double tensor matmul is ridiculously confusing anyways)
> 2. Adding a fixed byte array based storage for fixedsizedlist with
> primitives for Arrow<->Parquet conversion is desired to improve the
> performance. It was still slower than storing doubles, but much better than
> storing the list. We lose the parquet features eg delta encoding, list item
> statistics or bloom filter (this might be already missing for lists, I
> didn’t check yet)
> 3. The pandas numpy array is good news. I will confirm if the memory in the
> column is continuous and operations can be vectorized or is it more similar
> to an object storage with individual pointers
>
> I don’t think the above are blocking issues. I’ve raised this here only
> because I remember how annoying the timestamp and timezone conversions were
> (not round tripping with pandas, parquet storage change).
>
> P.S. I have almost zero experience with DNNs, but some reference how our
> layout compares to NCHW or what batch sizes are can be interesting in the
> docs:
>
> https://oneapi-src.github.io/oneDNN/dev_guide_understanding_memory_formats.html
> I guess it’s all doable with the proposed extension.
>
> Best regards,
> Adam Lippai
>
>
> On Mon, Mar 13, 2023 at 4:15 AM Alenka Frim  .invalid>
> wrote:
>
> > Hi Adam,
> >
> > you are referring to the issue you raised on the Arrow repo [1] that
> turned
> > into a good discussion about FixedSizeList and the current conversion
> > to Parquet.
> >
> > Please correct me if I am wrong, but the outcome of the discussion was
> that
> > the
> > conversion is still pretty fast (much faster than commonly used
> > serialization formats for
> > tensors) though not as fast compared to other primitives in Apache Arrow.
> >
> > My opinion is that the discussion on this topic can be opened up
> > separately in
> > connection to optimising conversion between FixedSizeList as an Arrow
> > format
> > to Parquet, if there is still a need to do so.
> >
> > For this canonical extension type I would say it is an implementation
> > detail
> > and you mention a way to handle that with Parquet in the issue mentioned
> > [2].
> >
> > I do not think there should be any issues in the conversion to Pandas.
> > The conversion to numpy is not expensive and I would think the conversion
> > to pandas should be the same. See PyArrow illustrative implementation
> [3].
> >
> > [1]: https://github.com/apache/arrow/issues/34510
> > [2]:
> https://github.com/apache/arrow/issues/34510#issuecomment-1464463384
> > [3]:
> >
> >
> https://github.com/apache/arrow/pull/33948/files#diff-efc1a41cdf04b6ec96d822dbec1f1993e0bbd17050b1b5f1275c8e3443a38828
> >
>

Re: [ANNOUNCE] New Arrow PMC member: Will Jones

2023-03-13 Thread Alenka Frim
Congratulations Will!!
Well deserved 👍

On Mon, Mar 13, 2023 at 7:18 PM Aldrin  wrote:

> Congrats Will!!
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Mon, Mar 13, 2023 at 11:13 AM Dewey Dunnington
>  wrote:
>
> > Congrats, Will!
> >
> > On Mon, Mar 13, 2023 at 3:07 PM Matt Topol 
> wrote:
> > >
> > > Congrats Will!
> > >
> > > On Mon, Mar 13, 2023, 2:02 PM Jacob Wujciak
> > 
> > > wrote:
> > >
> > > > Congratulations Will, well deserved!
> > > >
> > > > On Mon, Mar 13, 2023 at 6:58 PM Andrew Lamb 
> > wrote:
> > > >
> > > > > The Project Management Committee (PMC) for Apache Arrow has invited
> > > > > Will Jones to become a PMC member and we are pleased to announce
> > > > > that Will Jones has accepted.
> > > > >
> > > > > Congratulations and welcome!
> > > > >
> > > >
> >
>


Re: [VOTE][Format] Fixed shape tensor Canonical Extension Type

2023-03-13 Thread Alenka Frim
Hi Adam,

you are referring to the issue you raised on the Arrow repo [1] that turned
into a good discussion about FixedSizeList and the current conversion
to Parquet.

Please correct me if I am wrong, but the outcome of the discussion was that
the
conversion is still pretty fast (much faster than commonly used
serialization formats for
tensors) though not as fast compared to other primitives in Apache Arrow.

My opinion is that the discussion on this topic can be opened up
separately in
connection to optimising conversion between FixedSizeList as an Arrow format
to Parquet, if there is still a need to do so.

For this canonical extension type I would say it is an implementation detail
and you mention a way to handle that with Parquet in the issue mentioned
[2].

I do not think there should be any issues in the conversion to Pandas.
The conversion to numpy is not expensive and I would think the conversion
to pandas should be the same. See PyArrow illustrative implementation [3].

[1]: https://github.com/apache/arrow/issues/34510
[2]: https://github.com/apache/arrow/issues/34510#issuecomment-1464463384
[3]:
https://github.com/apache/arrow/pull/33948/files#diff-efc1a41cdf04b6ec96d822dbec1f1993e0bbd17050b1b5f1275c8e3443a38828

All well,
Alenka

On Fri, Mar 10, 2023 at 11:32 PM Adam Lippai  wrote:

> Since the specification explicitly mentions FixedSizeList, but the current
> conversion to/from parquet is expensive compared to doubles and other
> primitives (the nested type needs repetition and definition levels) should
> we discuss what’s the recommendation when integrating with other non-arrow
> systems or is that an implementation detail only? (Pandas, parquet)
>
> Best regards,
> Adam Lippai
>
> On Wed, Mar 8, 2023 at 1:13 AM Alenka Frim  .invalid>
> wrote:
>
> > >
> > > Just one comment, though: since we also define a separate "Tensor" IPC
> > > structure in Arrow, maybe we should state the relationship somewhere in
> > the
> > > documentation? (Even if the answer is "no relationship".)
> > >
> >
> > Agree David, thanks for bringing it up.
> >
> > I will add the information about "no relationship" to the Tensor IPC
> > structure into the spec and will also keep in mind to add it to the
> > documentation that follows the implementations.
> >
>


Re: [VOTE][Format] Fixed shape tensor Canonical Extension Type

2023-03-07 Thread Alenka Frim
>
> Just one comment, though: since we also define a separate "Tensor" IPC
> structure in Arrow, maybe we should state the relationship somewhere in the
> documentation? (Even if the answer is "no relationship".)
>

Agree David, thanks for bringing it up.

I will add the information about "no relationship" to the Tensor IPC
structure into the spec and will also keep in mind to add it to the
documentation that follows the implementations.


[VOTE][Format] Fixed shape tensor Canonical Extension Type

2023-03-06 Thread Alenka Frim
Hi all,

I am starting a new voting thread with this email as the first voting
thread [1] opened up new
comments and suggestions and we wanted to take time to see how that evolves.

*I would like to propose we vote on adding the fixed shape tensor canonical
extension type*
*with the following specification:*

Fixed shape tensor
==

* Extension name: `arrow.fixed_shape_tensor`.

* The storage type of the extension: ``FixedSizeList`` where:

  * **value_type** is the data type of individual tensor elements.
  * **list_size** is the product of all the elements in tensor shape.

* Extension type parameters:

  * **value_type** = the Arrow data type of individual tensor elements.
  * **shape** = the physical shape of the contained tensors
as an array.

  Optional parameters describing the logical layout:

  * **dim_names** = explicit names to tensor dimensions
as an array. The length of it should be equal to the shape
length and equal to the number of dimensions.

``dim_names`` can be used if the dimensions have well-known
names and they map to the physical layout (row-major).

  * **permutation**  = indices of the desired ordering of the
original dimensions, defined as an array.

The indices contain a permutation of the values [0, 1, .., N-1] where
N is the number of dimensions. The permutation indicates which
dimension of the logical layout corresponds to which dimension of the
physical tensor (the i-th dimension of the logical view corresponds
to the dimension with number ``permutations[i]`` of the physical tensor).

Permutation can be useful in case the logical order of
the tensor is a permutation of the physical order (row-major).

When logical and physical layout are equal, the permutation will always
be ([0, 1, .., N-1]) and can therefore be left out.

* Description of the serialization:

  The metadata must be a valid JSON object including shape of
  the contained tensors as an array with key **"shape"** plus optional
  dimension names with keys **"dim_names"** and ordering of the
  dimensions with key **"permutation"**.

  - Example: ``{ "shape": [2, 5]}``
  - Example with ``dim_names`` metadata for NCHW ordered data:

``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}``

  - Example of permuted 3-dimensional tensor:

``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``

This is the physical layout shape and the the shape of the logical
layout would in this case be ``[500, 100, 200]``.

.. note::

  Elements in a fixed shape tensor extension array are stored
  in row-major/C-contiguous order.

* The specification is submitted as a PR [2] to Canonical Extension Types
document under the
   format specifications directory [3].

There are also two implementations submitted to Apache Arrow repository:
* C++ implementation of the proposed specification [4]
* Python example implementation of the proposed specification and usage
(only illustrative) [5]


The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...


Regards, Alenka

[1]: https://lists.apache.org/thread/3cj0cr44hg3t2rn0kxly8td82yfob1nd
[2]: https://github.com/apache/arrow/pull/33925/files
[3]:
https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst

[4]: https://github.com/apache/arrow/pull/8510/files
[5]: https://github.com/apache/arrow/pull/33948/files


Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type

2023-03-06 Thread Alenka Frim
No problem Kevin. Thank you for sharing the information with your
colleagues.
All comments are much appreciated.

As there were no additional comments/suggestions to the spec itself, I will
open up another voting thread today.

Thanks all!
Alenka

On Tue, Feb 28, 2023 at 11:11 AM Kevin Gurney  wrote:

> Hi Alenka,
>
> Thank you. I've informed my colleagues at MathWorks to add any further
> comments to the PR.
>
> My apologies for bringing this up on the voting thread.
>
> Best Regards,
>
> Kevin Gurney
>
> ____
> From: Alenka Frim 
> Sent: Tuesday, February 28, 2023 4:19 AM
> To: dev@arrow.apache.org 
> Subject: Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type
>
> This was actually already meant as the voting thread, but given it sparked
> some more discussion, let's give this a few more days, and then re-start
> with a new vote thread.
>
> *So if someone still has comments on the current text, please bring those
> up here or in the PR*: https://github.com/apache/arrow/pull/33925<
> https://github.com/apache/arrow/pull/33925>.
>
> Alenka
>
> On Fri, Feb 24, 2023 at 10:15 AM Kevin Gurney 
> wrote:
>
> > Hi All,
> >
> > Thank you very much for creating this proposal, Alenka!
> >
> > I noticed the following in the notes [1] shared from the February 15th
> > Arrow Community Meeting:
> >
> > "Members of Hugging Face, Ray, and PyTorch community have given input and
> > some of it was incorporated - It would be good to have input from some
> > other companies and project communities including Lance, NumPy, Posit,
> > ​MATLAB, DLPack, CUDA/RAPIDS, Arrow Rust, Xarray, Julia, Fortran,
> > TensorFlow, LinkedIn"
> >
> > Based on the inclusion of MATLAB in the list above, I've shared this
> > proposal with some colleagues at MathWorks who have expertise in the deep
> > learning area. They will respond here if they have any additional input
> to
> > add.
> >
> > That being said, I recognize that this proposal is already nearing the
> > voting phase.
> >
> > [1] https://lists.apache.org/thread/bblcwwq7gl1x2hsr1qsormv9f3vr23jn<
> https://lists.apache.org/thread/bblcwwq7gl1x2hsr1qsormv9f3vr23jn>
> >
> > Best Regards,
> >
> > Kevin Gurney
> >
> > 
> > From: Rok Mihevc 
> > Sent: Thursday, February 23, 2023 8:12 AM
> > To: dev@arrow.apache.org 
> > Subject: Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type
> >
> > That makes sense indeed.
> > Do we have any more comments on the language of the proposal [1] or
> should
> > we proceed to vote?
> >
> > Rok
> >
> > [1] https://github.com/apache/arrow/pull/33925/files<
> https://github.com/apache/arrow/pull/33925/files><
> > https://github.com/apache/arrow/pull/33925/files<
> https://github.com/apache/arrow/pull/33925/files>>
> >
> > On Wed, Feb 22, 2023 at 2:13 PM Antoine Pitrou 
> wrote:
> >
> > >
> > > That's a good point.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 22/02/2023 à 14:11, Dewey Dunnington a écrit :
> > > > I don't think having both dimension names and permutation is
> > > > redundant...dimension names can also serve as human-readable tags
> that
> > > help
> > > > a human interpret the values. If reading a NetCDF, for example, one
> > might
> > > > store the dimension variable names. When determining type equality it
> > may
> > > > be useful that {..., permutation = [2, 0, 1], dim_names = ["C", "H",
> > > "W"]}
> > > > is not equal to {..., permutation = [2, 0, 1], dim_names = ["x", "y",
> > > "z"]}.
> > > >
> > > > On Wed, Feb 22, 2023 at 4:56 AM Rok Mihevc 
> > wrote:
> > > >
> > > >>>
> > > >>>>>
> > > >>>>> Should we rule that `dim_names` and `permutation` are mutually
> > > >>> exclusive?
> > > >>>>>
> > > >>>>
> > > >>>> Since `dim_names` have to "map to the physical layout (row-major)"
> > > that
> > > >>>> means permutation will always be trivial which indeed makes it
> > > >>> unnecessary
> > > >>>> to store both.
> > > >>>
> > > >>> I don't think it is necessarily needed to explicitly make them
> > > >>> mutually exclusive. I don't know how useful this would in practice,
> > > >>> but you certainly *can* specify both in a meaningful way. Re-using
> > the
> > > >>> example of NHWC data, which is physically stored as NCHW, you can
> > keep
> > > >>> track of this by specifying a permutation of [2, 0, 1], but at the
> > > >>> same time you could also still save the dimension names as ["C",
> "H",
> > > >>> "W"].
> > > >>>
> > > >>
> > > >> I'll advocate for the original comment, but I'm ok either way.
> Having
> > > both
> > > >> `dim_names` and `permutation` is redundant - if the user knows their
> > > >> desired order of `dim_names` they can derive the permutation. If
> they
> > > don't
> > > >> use `dim_names` they probably don't want them.
> > > >>
> > > >
> > >
> >
>


Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type

2023-02-28 Thread Alenka Frim
This was actually already meant as the voting thread, but given it sparked
some more discussion, let's give this a few more days, and then re-start
with a new vote thread.

*So if someone still has comments on the current text, please bring those
up here or in the PR*: https://github.com/apache/arrow/pull/33925.

Alenka

On Fri, Feb 24, 2023 at 10:15 AM Kevin Gurney  wrote:

> Hi All,
>
> Thank you very much for creating this proposal, Alenka!
>
> I noticed the following in the notes [1] shared from the February 15th
> Arrow Community Meeting:
>
> "Members of Hugging Face, Ray, and PyTorch community have given input and
> some of it was incorporated - It would be good to have input from some
> other companies and project communities including Lance, NumPy, Posit,
> ​MATLAB, DLPack, CUDA/RAPIDS, Arrow Rust, Xarray, Julia, Fortran,
> TensorFlow, LinkedIn"
>
> Based on the inclusion of MATLAB in the list above, I've shared this
> proposal with some colleagues at MathWorks who have expertise in the deep
> learning area. They will respond here if they have any additional input to
> add.
>
> That being said, I recognize that this proposal is already nearing the
> voting phase.
>
> [1] https://lists.apache.org/thread/bblcwwq7gl1x2hsr1qsormv9f3vr23jn
>
> Best Regards,
>
> Kevin Gurney
>
> 
> From: Rok Mihevc 
> Sent: Thursday, February 23, 2023 8:12 AM
> To: dev@arrow.apache.org 
> Subject: Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type
>
> That makes sense indeed.
> Do we have any more comments on the language of the proposal [1] or should
> we proceed to vote?
>
> Rok
>
> [1] https://github.com/apache/arrow/pull/33925/files<
> https://github.com/apache/arrow/pull/33925/files>
>
> On Wed, Feb 22, 2023 at 2:13 PM Antoine Pitrou  wrote:
>
> >
> > That's a good point.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 22/02/2023 à 14:11, Dewey Dunnington a écrit :
> > > I don't think having both dimension names and permutation is
> > > redundant...dimension names can also serve as human-readable tags that
> > help
> > > a human interpret the values. If reading a NetCDF, for example, one
> might
> > > store the dimension variable names. When determining type equality it
> may
> > > be useful that {..., permutation = [2, 0, 1], dim_names = ["C", "H",
> > "W"]}
> > > is not equal to {..., permutation = [2, 0, 1], dim_names = ["x", "y",
> > "z"]}.
> > >
> > > On Wed, Feb 22, 2023 at 4:56 AM Rok Mihevc 
> wrote:
> > >
> > >>>
> > >
> > > Should we rule that `dim_names` and `permutation` are mutually
> > >>> exclusive?
> > >
> > 
> >  Since `dim_names` have to "map to the physical layout (row-major)"
> > that
> >  means permutation will always be trivial which indeed makes it
> > >>> unnecessary
> >  to store both.
> > >>>
> > >>> I don't think it is necessarily needed to explicitly make them
> > >>> mutually exclusive. I don't know how useful this would in practice,
> > >>> but you certainly *can* specify both in a meaningful way. Re-using
> the
> > >>> example of NHWC data, which is physically stored as NCHW, you can
> keep
> > >>> track of this by specifying a permutation of [2, 0, 1], but at the
> > >>> same time you could also still save the dimension names as ["C", "H",
> > >>> "W"].
> > >>>
> > >>
> > >> I'll advocate for the original comment, but I'm ok either way. Having
> > both
> > >> `dim_names` and `permutation` is redundant - if the user knows their
> > >> desired order of `dim_names` they can derive the permutation. If they
> > don't
> > >> use `dim_names` they probably don't want them.
> > >>
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Wang Mingming

2023-02-22 Thread Alenka Frim
Congratulations!

On Wed, Feb 22, 2023 at 4:18 PM Raúl Cumplido 
wrote:

> Congratulations!
>
> El mié, 22 feb 2023 a las 16:17, Patrick Horan ()
> escribió:
> >
> > Congrats!
> >
> > On Wed, Feb 22, 2023, at 7:26 AM, Andrew Lamb wrote:
> > > Hi,
> > >
> > > On behalf of the Arrow PMC, I'm happy to announce that mingmwang
> > > has accepted an invitation to become a committer on Apache
> > > Arrow. Welcome, and thank you for your contributions!
> > >
> > > Andrew
> > >
>


Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type

2023-02-21 Thread Alenka Frim
> I would say "the data type of individual tensor elements".
> (so that people don't try to make it e.g. List(float64)).


Also, I don't think any reference to pyarrow should be made here.


Good catch! I have updated the text with:

  * **value_type** is the data type of individual tensor elements
and is an instance of Arrow ``DataType`` or ``Field``.

I would say the "the physical shape" to make it clear it refers to how
> values are laid out in memory, while `dim_names` and `permutation` drive
> the logical interpretation.


Have updated the description of the shape and added logical layout to the
optional
parameters text::

* Extension type parameters:


  * **value_type** = Arrow DataType or Field of the tensor elements.
  * **shape** = the physical shape of the contained tensors
as an array.


  Optional parameters describing the logical layout:


Perhaps explain in this example that the logical shape is [500, 100, 200]?
> (if I understand `permutation` correctly)


Updated the text with:

  - Example of permuted 3-dimensional tensor:


``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``


This is the physical layout shape and the the shape of the logical

layout would in this case be ``[500, 100, 200]``.


+1! I put together a quick R implementation as well to see how the
> permutation field fits with our native column-major storage [1]. It worked
> great! Thank you for all of your work assembling all of our collective
> opinions on this :-)
>

That is great to hear! Thank you so much for your input Dewey, it helped to
understand
the R side of things much better.

The updated version of the specification can be found here:
https://github.com/apache/arrow/pull/33925/files

All well,
Alenka


[VOTE] Format: Fixed shape tensor Canonical Extension Type

2023-02-21 Thread Alenka Frim
Hi all,

I would like to propose we vote on adding the fixed shape tensor canonical
extension type
with the following specification:

Fixed shape tensor
==

* Extension name: `arrow.fixed_shape_tensor`.

* The storage type of the extension: ``FixedSizeList`` where:

  * **value_type** is the data type of individual tensors and
is an instance of ``pyarrow.DataType`` or ``pyarrow.Field``.
  * **list_size** is the product of all the elements in tensor shape.

* Extension type parameters:

  * **value_type** = Arrow DataType of the tensor elements
  * **shape** = shape of the contained tensors as an array

  Optional parameters:

  * **dim_names** = explicit names to tensor dimensions
as an array. The length of it should be equal to the shape
length and equal to the number of dimensions.

``dim_names`` can be used if the dimensions have well-known
names and they map to the physical layout (row-major).

  * **permutation**  = indices of the desired ordering of the
original dimensions, defined as an array.

The indices contain a permutation of the values [0, 1, .., N-1] where
N is the number of dimensions. The permutation indicates which
dimension of the logical layout corresponds to which dimension of the
physical tensor (the i-th dimension of the logical view corresponds
to the dimension with number ``permutations[i]`` of the physical tensor).

Permutation can be useful in case the logical order of
the tensor is a permutation of the physical order (row-major).

When logical and physical layout are equal, the permutation will always
be ([0, 1, .., N-1]) and can therefore be left out.

* Description of the serialization:

  The metadata must be a valid JSON object including shape of
  the contained tensors as an array with key **"shape"** plus optional
  dimension names with keys **"dim_names"** and ordering of the
  dimensions with key **"permutation"**.

  - Example: ``{ "shape": [2, 5]}``
  - Example with ``dim_names`` metadata for NCHW ordered data:

``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}``

  - Example of permuted 3-dimensional tensor:

``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``

.. note::

  Elements in a fixed shape tensor extension array are stored
  in row-major/C-contiguous order.


* The specification is submitted as a PR [1] to Canonical Extension Types
document under the
   format specifications directory [2].

There are also two implementations submitted to Apache Arrow repository:
* C++ implementation of the proposed specification [3]
* Python example implementation of the proposed specification and usage
(only illustrative) [4]


The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...


Regards, Alenka

[1]: https://github.com/apache/arrow/pull/33925/files
[2]:
https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst

[3]: https://github.com/apache/arrow/pull/8510/files
[4]: https://github.com/apache/arrow/pull/33948/files


Re: [DISCUSS] Fixed shape tensor Canonical Extension Type

2023-02-14 Thread Alenka Frim
Hi all,

Thank you all for participating in the discussion. The feedback received
was very helpful!

I have updated the spec according to the discussion here and in the PR [1]
plus the talk we had with Rok and Joris. The change in the spec can be
found in the Description of the serialization section where dim_names and
permutations are now included as an *optional* metadata.

Please have a look at the PR [1] and give comments/suggest changes.
Once that is ready I will send the new version to the ML for a vote.

Rok has also created a google document titled Memory representations of
tensors in different languages [2] where he summarizes how other projects
and languages represent tensors/n-dim arrays. It gives a nice broader
picture of the topic.

[1] https://github.com/apache/arrow/pull/33925#
[2]
https://docs.google.com/document/d/1BG10KyDr62e0_WZqVaHcz90SnnLYmiVryZaayoKpmIA/edit?usp=sharing

All well,
Alenka

On Tue, Feb 14, 2023 at 1:00 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> On Tue, 7 Feb 2023 at 19:32, Quentin Lhoest 
> wrote:
> >
> > Hi,
> >
> > If I remember correctly one can already pass `types_mapper`
> > to `pa.Table.to_pandas`, to allow Ray or HF Datasets to define
> > their own pandas extension types associated to the arrow
> > extension types. I guess this could also be used until there is a
> decision
> > to include those types in Arrow or not ?
> >
>
> Yes, that's correct (although we should verify this also works to
> override this for extension types, i.e. that types_mappers gets the
> priority in deciding the resulting pandas extension dtype).
> For packages like Ray or HF Datasets, that might be a good enough
> solution; for end-users this is less convenient because you need to
> specify this any time you do a conversion from arrow to pandas, while
> with `to_pandas_dtype` mechanism this gets used by default.
>
> Joris
>


[DISCUSS] Fixed shape tensor Canonical Extension Type

2023-02-02 Thread Alenka Frim
Hi all!

There have been quite a lot of discussions connected to the tensor support
in Arrow Tables/RecorBatches. Issues to add support for a column in an
Arrow table that has value cells each containing a tensor value, with all
tensors having the same shape/dimensions [1] and a separate one for varying
shape [2] are already created in the Arrow repository.

Rok Mihevc, Joris Van den Bossche and I would like to start a discussion
about the specification for canonicalizing the fixed shape tensor type in
Arrow:

Fixed shape tensor

==

* Extension name: `arrow.fixed_shape_tensor`.

* The storage type of the extension: ``FixedSizeList`` where:

  * **value_type** is the data type of individual tensors and

is an instance of ``pyarrow.DataType`` or ``pyarrow.Field``.

  * **list_size** is the product of all the elements in tensor shape.

* Extension type parameters:

  * **value_type** = Arrow DataType of the tensor elements

  * **shape** = shape of the contained tensors as a tuple

* Description of the serialization:

  The metadata must be a valid JSON object including shape of

  the contained tensors as an array with key "shape".

  For example: `{ "shape": [2, 5]}`

.. note::

  Elements in an fixed shape tensor extension array are stored

  in row-major/C-contiguous order.

RFC umbrella issue [3] includes:

   -

   Specification for Tensor canonical type extension [4]
   -

   C++ implementation of the proposed specification [5]
   -

   Python example implementation of the proposed specification and usage
   (only illustrative) [6]

Open questions:

   -

   Should metadata include the "dim_names" key to pass dimension names when
   creating the Arrow FixedShapeTensorArray? Do we standardize how to specify
   those names and which names to use? Or the names shouldn't be standardized
   and it would be up to the application to understand them.

An example for NCHW ordered data [7]: the application could pass "dim_names":
["C", "H", "W"] when creating the Arrow FixedShapeTensorArray.

   -

   Should the implementation of the tensor extension type be in Arrow C++
   or should it be implemented in the bindings separately?

In the future we would like to canonicalize variable shape tensor type in
Arrow also.

Kind regards, Alenka

[1]: https://github.com/apache/arrow/issues/15483

[2]: https://github.com/apache/arrow/issues/24868

[3]: https://github.com/apache/arrow/issues/33924

[4]: https://github.com/apache/arrow/issues/33923

[5]: https://github.com/apache/arrow/issues/15483

[6]: https://github.com/apache/arrow/issues/33947
[7]: https://machinelearning.wtf/terms/nchw/


Re: [ANNOUNCE] New Arrow PMC chair: Andrew Lamb

2023-01-02 Thread Alenka Frim
Congratulations!
Thank you Andrew and Kou for all your hard work!

On Thu, Dec 29, 2022 at 6:21 AM Yang Jiang  wrote:

>
>
> On 2022/12/27 17:33:26 Micah Kornfield wrote:
> > Congrats!
> >
> > On Tue, Dec 27, 2022 at 8:51 AM Krisztián Szűcs <
> szucs.kriszt...@gmail.com>
> > wrote:
> >
> > > Congrats Andrew!
> > >
> > > On Tue, Dec 27, 2022 at 8:54 AM Ian Joiner 
> wrote:
> > > >
> > > > Congrats Andrew!
> > > >
> > > > Ian
> > > >
> > > > On Tuesday, December 27, 2022, Raúl Cumplido  >
> > > wrote:
> > > >
> > > > > Congratulations Andrew!
> > > > >
> > > > >
> > > > > El mar, 27 dic 2022, 7:48, Benson Muite <
> benson_mu...@emailplus.org>
> > > > > escribió:
> > > > >
> > > > > > Congratulations!
> > > > > > On 12/27/22 05:44, Yibo Cai wrote:
> > > > > > > Congratulations!
> > > > > > >
> > > > > > > -Original Message-
> > > > > > > From: Rok Mihevc 
> > > > > > > Sent: Tuesday, December 27, 2022 7:57 AM
> > > > > > > To: dev@arrow.apache.org
> > > > > > > Subject: Re: [ANNOUNCE] New Arrow PMC chair: Andrew Lamb
> > > > > > >
> > > > > > > Congratulations Andrew!
> > > > > > >
> > > > > > > Rok
> > > > > > >
> > > > > > > On Mon, Dec 26, 2022 at 11:26 PM Neal Richardson <
> > > > > > neal.p.richard...@gmail.com> wrote:
> > > > > > >
> > > > > > >> Congratulations!
> > > > > > >>
> > > > > > >> On Mon, Dec 26, 2022 at 4:38 PM Matt Topol <
> > > zotthewiz...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >>> Congrats!!!
> > > > > > >>>
> > > > > > >>> On Mon, Dec 26, 2022, 12:47 PM Jacob Wujciak
> > > > > > >>  > > > > > 
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > >  Congratulations Andrew!
> > > > > > 
> > > > > >  Matthew Turner  schrieb am
> Mo.,
> > > 26.
> > > > > > Dez.
> > > > > >  2022, 16:44:
> > > > > > 
> > > > > > > Congratulations, Andrew!
> > > > > > >
> > > > > > > From: Yijie Shen 
> > > > > > > Date: Monday, December 26, 2022 at 8:14 AM
> > > > > > > To: dev@arrow.apache.org 
> > > > > > > Subject: Re: [ANNOUNCE] New Arrow PMC chair: Andrew Lamb
> > > > > > > Congrats Andrew!
> > > > > > >
> > > > > > > On Mon, Dec 26, 2022 at 20:37 Wang Xudong
> > > > > > > 
> > > > > >  wrote:
> > > > > > >
> > > > > > >> Congratulations Andrew!
> > > > > > >>
> > > > > > >> Thank you for your dedication to arrow rust ecosystem!
> > > > > > >>
> > > > > > >> Willy Kuo  于2022年12月26日周一
> 20:13写道:
> > > > > > >>
> > > > > > >>> Congratulations Andrew!
> > > > > > >>>
> > > > > > >>> Sent from my iPhone
> > > > > > >>>
> > > > > >  On Dec 26, 2022, at 7:48 PM, Nic Crane
> > > > > >  
> > > > > >  wrote:
> > > > > > 
> > > > > >  Congratulations!
> > > > > > 
> > > > > > > On Mon, 26 Dec 2022, 11:01 Daniël Heres, <
> > > > > > >> danielhe...@gmail.com
> > > > > > 
> > > > > > >> wrote:
> > > > > > >
> > > > > > > Congrats Andrew!
> > > > > > >
> > > > > > >> On Mon, Dec 26, 2022, 09:00 L. C. Hsieh
> > > > > > >> 
> > > > > >  wrote:
> > > > > > >>
> > > > > > >> Congratulations!
> > > > > > >>
> > > > > > >> On Sun, Dec 25, 2022 at 10:36 PM Weston Pace <
> > > > > > > weston.p...@gmail.com>
> > > > > > >> wrote:
> > > > > > >>>
> > > > > > >>> Congratulations!
> > > > > > >>>
> > > > > > >>> On Sun, Dec 25, 2022, 9:44 PM Remzi Yang <
> > > > > >  1371656737...@gmail.com
> > > > > > >>
> > > > > > >> wrote:
> > > > > > >>>
> > > > > >  Congratulation Andrew!
> > > > > > 
> > > > > >  On Mon, 26 Dec 2022 at 13:40, David Li <
> > > > > > >> lidav...@apache.org>
> > > > > > >> wrote:
> > > > > > 
> > > > > > > Congrats Andrew!
> > > > > > >
> > > > > > > On Mon, Dec 26, 2022, at 00:26, vin jake wrote:
> > > > > > >> congratulation!
> > > > > > >>
> > > > > > >> Sutou Kouhei  于 2022年12月26日周一
> > > > > > >> 12:54写道:
> > > > > > >>
> > > > > > >>> I am pleased to announce that we have a new PMC
> > > > > > >>> chair
> > > > > > >> and
> > > > > > >>> VP
> > > > > > > as
> > > > > > >> per
> > > > > > >>> our newly started tradition of rotating the chair
> > > > > > >>> once a
> > > > > > > year. I
> > > > > > >> have
> > > > > > >>> resigned and Andrew Lamb was duly elected by the
> > > > > > >>> PMC and
> > > > > > > approved
> > > > > > >>> unanimously by the board. Please join me in
> > > > > > >> congratulating
> > > > > > > Andrew
> > > > > >  Lamb!
> > > > > > >>>
> > > > > > >>> Thanks,
> > > > > > >>> --
> > > > > > >>> kou
> > > > > > >>>
> > > > 

Re: [VOTE] Disable ASF Jira issue reporting

2022-12-15 Thread Alenka Frim
Thank you for working on this Rok 🙏

On Fri, 16 Dec 2022 at 01:21, Rok Mihevc  wrote:

> The vote is now 8 +1 votes, 1 +1 "when the merge scripts are ready" and 1
> -1 vote "until the labels are ready".
>
> Please correct me if I'm wrong, but I believe merge scripts and labels are
> now ready. If that is the case we can tally this vote as 10 +1 votes and
> proceed with disabling ASF Jira issue reporting. I'll wait 24 hours if
> there are objections and then ask Infra to disable creating new issues.
>
> Rok
>
> On Mon, Nov 28, 2022 at 4:58 PM Matthew Topol  >
> wrote:
>
> > +1
> >
> > On Fri, Nov 25, 2022 at 10:31 AM Alessandro Molina
> >  wrote:
> >
> > > +1 as far as for "now" we actually mean "as soon as the necessary
> scripts
> > > have been ported to github"
> > >
> > > I mean, I doubt the plan is to disable jira before we can actually ship
> > PRs
> > > from github issues and thus block development.
> > >
> > >
> > >
> > > Il Mer 23 Nov 2022, 22:37 Todd Farmer 
> ha
> > > scritto:
> > >
> > > > Hello,
> > > >
> > > > I would like to propose that issue reporting in ASF Jira for the
> Apache
> > > > Arrow project be disabled, and all users directed to use GitHub
> issues
> > > for
> > > > reporting going forward. GitHub issue reporting is now enabled [1] in
> > > > response to a recent Infra policy change eliminating self-service
> user
> > > > registration for ASF Jira accounts. The Apache Arrow project has
> > already
> > > > voted in support of migrating issue tracking from ASF Jira to GitHub
> > > issues
> > > > [2], and migration work is ongoing [3].
> > > >
> > > > Disabling ASF Jira issue reporting will move all such work to GitHub
> > > > issues. I expect that usage of this new platform by all participants
> -
> > > not
> > > > just new community members lacking ASF Jira accounts - will expedite
> > > > further discovery and improvements to this platform. Furthermore,
> this
> > > will
> > > > prevent new users from being routed to a new, and potentially
> "lesser",
> > > > issue reporting experience.
> > > >
> > > > Please note that this proposal does NOT move work on existing ASF
> Jira
> > > > issues to GitHub - that work should continue in Jira until issues are
> > > > migrated and the Jira system set to read-only. There will be a
> separate
> > > > discussion when that activity is ready.
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Disable issue reporting on ASF Jira for the Apache Arrow
> project
> > > > [ ] -1 Leave issue reporting enabled on ASF Jira for the Apache Arrow
> > > > project because...
> > > >
> > > > [1] https://github.com/apache/arrow/issues/new/choose
> > > > [2] https://lists.apache.org/thread/l545m95xmf3w47oxwqxvg811or7b93tb
> > > > [3]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1UaSJs-oyuq8QvlUPoQ9GeiwP19LK5ZzF_5-HLfHDCIg/edit?usp=sharing
> > > >
> > > > Todd Farmer
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Jacob Wujciak

2022-12-15 Thread Alenka Frim
Congrats!! 🎉

On Fri, 16 Dec 2022 at 08:11, Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> Congrats!
>
> On Fri, 16 Dec 2022 at 03:22, Dewey Dunnington
>  wrote:
> >
> > Congrats, Jacob!
> >
> > On Thu, Dec 15, 2022 at 9:26 PM Matt Topol 
> wrote:
> >
> > > Congrats Jacob!!
> > >
> > > On Thu, Dec 15, 2022, 7:53 PM Neal Richardson <
> neal.p.richard...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Congrats!
> > > >
> > > > On Thu, Dec 15, 2022 at 7:00 PM Ian Cook 
> wrote:
> > > >
> > > > > Herzlichen Glückwunsch, Jacob!
> > > > >
> > > > > On Thu, Dec 15, 2022 at 6:56 PM Rok Mihevc 
> > > wrote:
> > > > > >
> > > > > > Congrats Jacob!!
> > > > > >
> > > > > > Rok
> > > > > >
> > > > > > On Fri, Dec 16, 2022 at 12:52 AM Vibhatha Abeykoon <
> > > vibha...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Congratulations Jacob!!!
> > > > > > >
> > > > > > > On Fri, Dec 16, 2022 at 5:09 AM Raúl Cumplido <
> > > > raulcumpl...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Jacob!
> > > > > > > >
> > > > > > > > El vie, 16 dic 2022 a las 0:34, Weston Pace (<
> > > > weston.p...@gmail.com
> > > > > >)
> > > > > > > > escribió:
> > > > > > > >
> > > > > > > > > Congratulations Jacob!
> > > > > > > > >
> > > > > > > > > On Thu, Dec 15, 2022 at 3:27 PM David Li <
> lidav...@apache.org>
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Congrats & welcome Jacob!
> > > > > > > > > >
> > > > > > > > > > On Thu, Dec 15, 2022, at 18:14, Nic Crane wrote:
> > > > > > > > > > > On behalf of the Arrow PMC, I'm happy to announce that
> > > Jacob
> > > > > > > Wujciak
> > > > > > > > > has
> > > > > > > > > > > accepted an invitation to become a committer on Apache
> > > Arrow.
> > > > > > > > Welcome,
> > > > > > > > > and
> > > > > > > > > > > thank you for your contributions!
> > > > > > > > > > >
> > > > > > > > > > > Nic
> > > > > > > > >
> > > > > > > >
> > > > > > > --
> > > > > > > Vibhatha Abeykoon
> > > > > > >
> > > > >
> > > >
> > >
>


Re: [ANNOUNCE] New Arrow committer: Raúl Cumplido

2022-12-05 Thread Alenka Frim
Congratulations Raul!! 🎉

On Tue, Dec 6, 2022 at 6:54 AM Benson Muite 
wrote:

> On 12/6/22 05:53, Sutou Kouhei wrote:
> > On behalf of the Arrow PMC, I'm happy to announce that Raúl
> > Cumplido has accepted an invitation to become a committer on
> > Apache Arrow. Welcome, and thank you for your contributions!
> >
> Congratulations Raúl
>


Re: [VOTE] Disable ASF Jira issue reporting

2022-11-24 Thread Alenka Frim
+1

On Thu, 24 Nov 2022 at 17:44, Anja  wrote:

> +1
>
> On Thu, 24 Nov 2022 at 11:23, Rok Mihevc  wrote:
>
> > +1
> >
> > I would propose to also add a note about using tags (e.g. [C++][Parquet]
> > before the issue name) when opening a new issue.
> >
> > Rok
> >
> > On Thu, Nov 24, 2022 at 5:03 PM Nic  wrote:
> >
> > > +1
> > >
> > > On Thu, 24 Nov 2022 at 15:57, Joris Van den Bossche <
> > > jorisvandenboss...@gmail.com> wrote:
> > >
> > > > +1
> > > >
> > > > On Wed, 23 Nov 2022 at 22:37, Todd Farmer
>  > >
> > > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I would like to propose that issue reporting in ASF Jira for the
> > Apache
> > > > > Arrow project be disabled, and all users directed to use GitHub
> > issues
> > > > for
> > > > > reporting going forward. GitHub issue reporting is now enabled [1]
> in
> > > > > response to a recent Infra policy change eliminating self-service
> > user
> > > > > registration for ASF Jira accounts. The Apache Arrow project has
> > > already
> > > > > voted in support of migrating issue tracking from ASF Jira to
> GitHub
> > > > issues
> > > > > [2], and migration work is ongoing [3].
> > > > >
> > > > > Disabling ASF Jira issue reporting will move all such work to
> GitHub
> > > > > issues. I expect that usage of this new platform by all
> participants
> > -
> > > > not
> > > > > just new community members lacking ASF Jira accounts - will
> expedite
> > > > > further discovery and improvements to this platform. Furthermore,
> > this
> > > > will
> > > > > prevent new users from being routed to a new, and potentially
> > "lesser",
> > > > > issue reporting experience.
> > > > >
> > > > > Please note that this proposal does NOT move work on existing ASF
> > Jira
> > > > > issues to GitHub - that work should continue in Jira until issues
> are
> > > > > migrated and the Jira system set to read-only. There will be a
> > separate
> > > > > discussion when that activity is ready.
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Disable issue reporting on ASF Jira for the Apache Arrow
> > project
> > > > > [ ] -1 Leave issue reporting enabled on ASF Jira for the Apache
> Arrow
> > > > > project because...
> > > > >
> > > > > [1] https://github.com/apache/arrow/issues/new/choose
> > > > > [2]
> https://lists.apache.org/thread/l545m95xmf3w47oxwqxvg811or7b93tb
> > > > > [3]
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1UaSJs-oyuq8QvlUPoQ9GeiwP19LK5ZzF_5-HLfHDCIg/edit?usp=sharing
> > > > >
> > > > > Todd Farmer
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Will Jones

2022-11-02 Thread Alenka Frim
Congratulations Will!! 🎉

On Fri, 28 Oct 2022 at 22:20, Andrew Lamb  wrote:

> Congratulations Will
>
> On Fri, Oct 28, 2022 at 10:21 AM Nic  wrote:
>
> > Fantastic, congratulations Will!
> >
> > On Fri, 28 Oct 2022 at 10:52, Raúl Cumplido 
> > wrote:
> >
> > > Congratulations Will!
> > >
> > > El vie, 28 oct 2022 a las 11:46, Antoine Pitrou ()
> > > escribió:
> > >
> > > >
> > > > Welcome Will, and thanks for your contributions!
> > > >
> > > >
> > > > Le 28/10/2022 à 01:56, Sutou Kouhei a écrit :
> > > > > On behalf of the Arrow PMC, I'm happy to announce that Will Jones
> > > > > has accepted an invitation to become a committer on Apache
> > > > > Arrow. Welcome, and thank you for your contributions!
> > > > >
> > > > > kou
> > > >
> > >
> >
>


Re: [VOTE] Move issue tracking to GitHub Issues

2022-10-27 Thread Alenka Frim
+1

On Thu, Oct 27, 2022 at 2:36 PM prem sagar gali 
wrote:

> +1
>
> On Thu, Oct 27, 2022 at 7:13 AM Dewey Dunnington
>  wrote:
>
> > +1 (non-binding)!
> >
> > On Thu, Oct 27, 2022 at 8:54 AM Eric Hanson 
> > wrote:
> >
> > > +1
> > >
> > > On 2022/10/26 23:02:33 Neal Richardson wrote:
> > > > I propose that we move issue tracking from the ASF's Jira to GitHub
> > > Issues.
> > > > This has been discussed on [1] and [2] and there seems to be
> > consensus. A
> > > > number of Arrow subprojects already use GitHub Issues; this moves the
> > > issue
> > > > tracking for `apache/arrow` into GitHub along with the source code.
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 Leave ASF Jira and move to GitHub Issues
> > > > [ ] +0
> > > > [ ] -1 Remain in Jira because...
> > > >
> > > > My vote: +1
> > > >
> > > > Neal
> > > >
> > > >
> > > > [1]:
> https://lists.apache.org/thread/l545m95xmf3w47oxwqxvg811or7b93tb
> > > > [2]:
> https://lists.apache.org/thread/0vwj8gdo55jly5zn16wksrotyqqm0zqr
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Jacob Quinn

2022-10-25 Thread Alenka Frim
Congratulations!

On Wed, Oct 26, 2022 at 7:54 AM Daniël Heres  wrote:

> Congratulations!
>
> On Wed, Oct 26, 2022, 07:50 Benson Muite 
> wrote:
>
> > Congratulations Jacob!
> > On 10/26/22 04:12, Vibhatha Abeykoon wrote:
> > > Congratulations Jacob!
> > >
> > > On Wed, Oct 26, 2022 at 4:23 AM Rok Mihevc 
> wrote:
> > >
> > >> Congratulations Jacob!
> > >>
> > >> Rok
> > >>
> > >> On Tue, Oct 25, 2022 at 11:15 PM David Li 
> wrote:
> > >>
> > >>> Congrats Jacob!!
> > >>>
> > >>> On Tue, Oct 25, 2022, at 17:06, Sutou Kouhei wrote:
> >  The Project Management Committee (PMC) for Apache Arrow has invited
> >  Jacob Quinn to become a PMC member and we are pleased to announce
> >  that Jacob Quinn has accepted.
> > 
> >  Congratulations and welcome!
> > >>>
> > >>
> >
> >
>


Re: [ANNOUNCE] New Arrow committer: Bogumił Kamiński

2022-10-25 Thread Alenka Frim
Congratulations!

On Wed, Oct 26, 2022 at 7:55 AM Daniël Heres  wrote:

> Congratulations!
>
> On Wed, Oct 26, 2022, 07:50 Benson Muite 
> wrote:
>
> > Congratulations Bogumił!
> > On 10/26/22 04:13, Vibhatha Abeykoon wrote:
> > > Congrats Bogumił!
> > >
> > > On Wed, Oct 26, 2022 at 4:24 AM Rok Mihevc 
> wrote:
> > >
> > >> Congrats Bogumił!
> > >>
> > >> Rok
> > >>
> > >> On Tue, Oct 25, 2022 at 11:15 PM David Li 
> wrote:
> > >>
> > >>> Welcome Bogumił!
> > >>>
> > >>> On Tue, Oct 25, 2022, at 17:05, Sutou Kouhei wrote:
> >  Hi,
> > 
> >  On behalf of the Arrow PMC, I'm happy to announce that Bogumił
> > Kamiński
> >  has accepted an invitation to become a committer on Apache
> >  Arrow. Welcome, and thank you for your contributions!
> > 
> >  Thanks,
> >  --
> >  kou
> > >>>
> > >>
> >
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Nicola Crane

2022-10-25 Thread Alenka Frim
🎉 Congratulations Nic! Well deserved!!

On Wed, Oct 26, 2022 at 7:54 AM Daniël Heres  wrote:

> Congratulations!
>
> On Wed, Oct 26, 2022, 07:50 Benson Muite 
> wrote:
>
> > Congratulations Nic!
> > On 10/26/22 04:11, Vibhatha Abeykoon wrote:
> > > Congrats Nic!
> > >
> > > On Wed, Oct 26, 2022 at 5:30 AM Ashish 
> wrote:
> > >
> > >> Congrats !
> > >>
> > >> On Wednesday, October 26, 2022, Anja  wrote:
> > >>
> > >>> Congrats!!
> > >>>
> > >>> On Tue, 25 Oct 2022 at 15:45, Rok Mihevc 
> wrote:
> > >>>
> >  Congrats Nic!
> > 
> >  Rok
> > 
> >  On Tue, Oct 25, 2022 at 11:16 PM Will Jones <
> will.jones...@gmail.com>
> >  wrote:
> > 
> > > Congrats Nic!
> > >
> > > On Tue, Oct 25, 2022 at 2:14 PM David Li 
> > >> wrote:
> > >
> > >> Congrats & welcome Nic!
> > >>
> > >> On Tue, Oct 25, 2022, at 17:07, Matt Topol wrote:
> > >>> Congrats!!
> > >>>
> > >>> On Tue, Oct 25, 2022 at 5:06 PM Sutou Kouhei  > >>>
> > > wrote:
> > >>>
> >  The Project Management Committee (PMC) for Apache Arrow has
> > >>> invited
> >  Nicola Crane to become a PMC member and we are pleased to
> > >> announce
> >  that Nicola Crane has accepted.
> > 
> >  Congratulations and welcome!
> > 
> > >>
> > >
> > 
> > >>>
> > >>
> > >>
> > >> --
> > >> thanks
> > >> ashish
> > >>
> >
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-05 Thread Alenka Frim
Congratulations Weston!

On Mon, Sep 5, 2022 at 2:01 PM David Li  wrote:

> Congratulations, and welcome Weston!
>
> On Mon, Sep 5, 2022, at 06:50, Jacob Wujciak wrote:
> > Congratulations!
> >
> > On Mon, Sep 5, 2022 at 7:56 AM Sutou Kouhei  wrote:
> >
> >> The Project Management Committee (PMC) for Apache Arrow has invited
> >> Weston Pace to become a PMC member and we are pleased to announce
> >> that Weston Pace has accepted.
> >>
> >> Congratulations and welcome!
> >>
>


Re: [ANNOUNCE] New Arrow committers: Dewey Dunnington, Alenka Frim, and Rok Mihevc

2022-06-23 Thread Alenka Frim
Thank you so much! And congrats to my fellow committers ❤

V čet., 23. jun. 2022 02:36 je oseba Wang Xudong 
napisala:

> Congratulations!!
>
> Vibhatha Abeykoon  于2022年6月23日周四 07:36写道:
>
> > Congratulations Dewey, Alenka and Rok!!!
> >
> > On Thu, Jun 23, 2022 at 4:30 AM Matt Topol 
> wrote:
> >
> > > Congratulations!! Welcome!
> > >
> > > On Wed, Jun 22, 2022, 6:40 PM David Li  wrote:
> > >
> > > > Congratulations Alenka, Dewey, and Rok!
> > > >
> > > > On Wed, Jun 22, 2022, at 14:55, L. C. Hsieh wrote:
> > > > > Congratulations!
> > > > >
> > > > > On Wed, Jun 22, 2022 at 11:28 AM Antoine Pitrou <
> anto...@python.org>
> > > > wrote:
> > > > >>
> > > > >>
> > > > >> Welcome to our new committers!
> > > > >>
> > > > >>
> > > > >> Le 22/06/2022 à 20:02, Andrew Lamb a écrit :
> > > > >> > Congratulations!
> > > > >> >
> > > > >> > On Wed, Jun 22, 2022 at 1:27 PM Dragoș Moldovan-Grünfeld <
> > > > >> > dragos.m...@gmail.com> wrote:
> > > > >> >
> > > > >> >> Congratulations!
> > > > >> >>
> > > > >> >> Sent from my iPhone
> > > > >> >>
> > > > >> >>> On 22 Jun 2022, at 18:13, Neal Richardson <
> > > > neal.p.richard...@gmail.com>
> > > > >> >> wrote:
> > > > >> >>>
> > > > >> >>> On behalf of the Arrow PMC, I'm happy to announce that
> > > > >> >>>
> > > > >> >>> Dewey Dunnington
> > > > >> >>> Alenka Frim
> > > > >> >>> Rok Mihevc
> > > > >> >>>
> > > > >> >>> have all accepted invitations to become committers on Apache
> > > Arrow!
> > > > >> >>> Welcome, thank you for all your contributions so far, and we
> > look
> > > > forward
> > > > >> >>> to continuing to drive Apache Arrow forward to an even better
> > > place
> > > > in
> > > > >> >> the
> > > > >> >>> future.
> > > > >> >>>
> > > > >> >>> Neal
> > > > >> >>
> > > > >> >
> > > >
> > >
> > --
> > Vibhatha Abeykoon
> >
>


Re: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies, Wang Xudong, Yijie Shen, and Kun Liu

2022-03-10 Thread Alenka Frim
Congratulations all!

On Thu, Mar 10, 2022 at 1:55 AM Yang hao <1371656737...@gmail.com> wrote:

> Congratulations to all!
>
> From: Benson Muite 
> Date: Thursday, March 10, 2022 at 03:45
> To: dev@arrow.apache.org 
> Subject: Re: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies, Wang
> Xudong, Yijie Shen, and Kun Liu
> Congratulations!
>
> On 3/9/22 9:56 PM, David Li wrote:
> > Congrats everyone!
> >
> > On Wed, Mar 9, 2022, at 13:47, Rok Mihevc wrote:
> >> Congrats all!
> >>
> >> Rok
> >>
> >> On Wed, Mar 9, 2022 at 7:16 PM QP Hou  wrote:
> >>>
> >>> Congratulations to all, well deserved!
> >>>
> >>> On Wed, Mar 9, 2022 at 9:37 AM Daniël Heres 
> wrote:
> 
>  Congratulations!
> 
>  On Wed, Mar 9, 2022, 18:26 LM  wrote:
> 
> > Congrats to you all!
> >
> > On Wed, Mar 9, 2022 at 9:19 AM Chao Sun  wrote:
> >
> >> Congrats all!
> >>
> >> On Wed, Mar 9, 2022 at 9:16 AM Micah Kornfield <
> emkornfi...@gmail.com>
> >> wrote:
> >>>
> >>> Congrats!
> >>>
> >>> On Wed, Mar 9, 2022 at 8:36 AM Weston Pace 
> >> wrote:
> >>>
>  Congratulations to all of you!
> 
>  On Wed, Mar 9, 2022, 4:52 AM Matthew Turner <
> >> matthew.m.tur...@outlook.com>
>  wrote:
> 
> > Congrats all and thank you for your contributions! It's been
> great
> > to
>  work
> > with and learn from you all.
> >
> > -Original Message-
> > From: Andrew Lamb 
> > Sent: Wednesday, March 9, 2022 8:59 AM
> > To: dev 
> > Subject: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies,
> > Wang
> > Xudong, Yijie Shen, and Kun Liu
> >
> > On behalf of the Arrow PMC, I'm happy to announce that
> >
> > Raphael Taylor-Davies
> > Wang Xudong
> > Yijie Shen
> > Kun Liu
> >
> > Have all accepted invitations to become committers on Apache
> Arrow!
> > Welcome, thank you for all your contributions so far, and we look
> >> forward
> > to continuing to drive Apache Arrow forward to an even better
> place
> >> in
>  the
> > future.
> >
> > This exciting growth in committers mirrors the growth of the
> Arrow
> >> Rust
> > community.
> >
> > Andrew
> >
> > p.s. sorry for the somewhat impersonal email; I was trying to
> avoid
> > several very similar emails. I am truly excited for each of these
> > individuals.
> >
> 
> >>
> >
>


Re: [ANNOUNCE] New Arrow committer: Jacob Quinn

2022-02-24 Thread Alenka Frim
Congratulations Jacob!

On Fri, Feb 25, 2022 at 2:21 AM hao Yang <1371656737...@gmail.com> wrote:

> Congratulations Jacob!
>
> On Fri, 25 Feb 2022 at 09:15, Vibhatha Abeykoon 
> wrote:
>
> > Congratulations Jacob!
> >
> > On Fri, Feb 25, 2022 at 6:42 AM Weston Pace 
> wrote:
> >
> > > Congratulations Jacob!
> > >
> > > On Thu, Feb 24, 2022 at 2:34 PM QP Hou  wrote:
> > > >
> > > > Congratulations and welcome Jacob!
> > > >
> > > > Thanks,
> > > > QP Hou
> > > >
> > > > On Thu, Feb 24, 2022 at 4:06 PM Ian Joiner 
> > > wrote:
> > > > >
> > > > > Congrats Jacob!
> > > > >
> > > > > On Thursday, February 24, 2022, David Li 
> > wrote:
> > > > >
> > > > > > Congrats & welcome Jacob!
> > > > > >
> > > > > > On Thu, Feb 24, 2022, at 17:39, Rok Mihevc wrote:
> > > > > > > Congratulations Jacob!
> > > > > > >
> > > > > > > Rok
> > > > > > >
> > > > > > > On Thu, Feb 24, 2022 at 11:13 PM Micah Kornfield <
> > > emkornfi...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Congrats Jacob!
> > > > > > >>
> > > > > > >> On Thu, Feb 24, 2022 at 1:50 PM Wes McKinney <
> > wesmck...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > On behalf of the Arrow PMC, I'm happy to announce that Jacob
> > > Quinn has
> > > > > > >> > accepted an invitation to become a committer on Apache
> Arrow.
> > > Welcome,
> > > > > > >> > and thank you for your contributions!
> > > > > > >> >
> > > > > > >> > Wes
> > > > > > >> >
> > > > > >
> > >
> > --
> > Vibhatha Abeykoon
> >
>


Re: [ANNOUNCE] New Arrow committer: Alessandro Molina

2022-01-05 Thread Alenka Frim
Congratulations!

On Wed, Jan 5, 2022 at 8:37 PM Krisztián Szűcs 
wrote:

> Congrats Alessandro!
>
> On Wed, Jan 5, 2022 at 6:57 PM Weston Pace  wrote:
> >
> > Congratulations Alessandro!
> >
> > On Wed, Jan 5, 2022 at 7:39 AM Vibhatha Abeykoon 
> wrote:
> > >
> > > Congratulations
> > >
> > > On Wed, Jan 5, 2022 at 9:29 PM Supun Kamburugamuve 
> wrote:
> > >
> > > > Congratulations!
> > > >
> > > > On Wed, Jan 5, 2022 at 10:17 AM Niranda Perera <
> niranda.per...@gmail.com>
> > > > wrote:
> > > >
> > > > > Congrats Alessandro! :-)
> > > > >
> > > > > On Wed, Jan 5, 2022 at 9:54 AM David Li 
> wrote:
> > > > >
> > > > > > Congrats Alessandro!
> > > > > >
> > > > > > On Wed, Jan 5, 2022, at 09:45, Ian Cook wrote:
> > > > > > > Congratulations Alessandro!
> > > > > > >
> > > > > > > On Wed, Jan 5, 2022 at 9:40 AM Rok Mihevc <
> rok.mih...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Congrats Alessandro!
> > > > > > > >
> > > > > > > > On Wed, Jan 5, 2022 at 2:24 PM Eduardo Ponce <
> edponc...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Great addition to PMC. Congratulations!
> > > > > > > > >
> > > > > > > > > ~Eduardo
> > > > > > > > >
> > > > > > > > > On Wed, Jan 5, 2022 at 7:34 AM Wes McKinney <
> wesmck...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > On behalf of the Arrow PMC, I'm happy to announce that
> > > > Alessandro
> > > > > > > > > > Molina has accepted an invitation to become a committer
> on
> > > > Apache
> > > > > > > > > > Arrow. Welcome, and thank you for your contributions!
> > > > > > > > > >
> > > > > > > > > > Wes
> > > > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Niranda Perera
> > > > > https://niranda.dev/
> > > > > @n1r44 
> > > > >
> > > >
> > > >
> > > > --
> > > > Supun Kamburugamuve
> > > >
> > > --
> > > Vibhatha Abeykoon
>


Re: [ANNOUNCE] New Arrow PMC member: Joris Van den Bossche

2021-11-17 Thread Alenka Frim
Congratulations Joris!
Alenka

> On 18 Nov 2021, at 03:33, Ian Joiner  wrote:
> 
> Congrats Joris and really thanks for your effort in integrating ORC and 
> dataset!
> 
> Ian 
> 
>> On Nov 17, 2021, at 5:55 PM, Wes McKinney  wrote:
>> 
>> The Project Management Committee (PMC) for Apache Arrow has invited
>> Joris Van den Bossche to become a PMC member and we are pleased to
>> announce that Joris has accepted.
>> 
>> Congratulations and welcome!
>