Re: [Dev] Switch to token authentication for archery & merge script

2022-06-01 Thread Sutou Kouhei
Hi,

I like this. I tried this before but it didn't work.
I tried the pull request and it worked!

Thanks,
-- 
kou


In 
  "[Dev] Switch to token authentication for archery & merge script" on Wed, 1 
Jun 2022 13:44:06 +0200,
  Jacob Wujciak  wrote:

> Hello Everyone,
> 
> I would like to propose that we switch from basic authentication with JIRA
> in the merge script and archery to PAT based token authentication.
> 
> Basic authentication is deprecated in Jira Cloud [1] and PATs remove the
> need to save your password in clear text (e.g. config file), PATS are
> easily created [2], PATs can be revoked and an expiry date set. Token
> authentication also does not trigger a captcha. The changes needed in
> workflow and scripts are minimal and easily outweighed by the advantages.
> 
> I have created a PR with the necessary changes [3].
> 
> Best,
> Jacob
> 
> [1]:
> https://developer.atlassian.com/cloud/jira/platform/deprecation-notice-basic-auth-and-cookie-based-auth/
> [2]:
> https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html
> [3]: https://github.com/apache/arrow/pull/13283


Re: [Discuss][C++] macOS minimum requirements

2022-06-01 Thread Sutou Kouhei
Hi,

In 
  "[Discuss][C++] macOS minimum requirements" on Wed, 1 Jun 2022 16:22:17 +0200,
  Antoine Pitrou  wrote:

> The topic came up recently of bumping up our minimal macOS
> requirements from 10.11 to 10.13 (*).  Do people have any particular
> concerns about this?
> 
> (*) https://github.com/apache/arrow/pull/13157#issuecomment-1143670152

The Apple's macOS release notes page
https://developer.apple.com/documentation/macos-release-notes
doesn't show 10.13 or earlier. It seems that 10.13 reached
EOL. So I think that 10.11 -> 10.13 isn't a problem.


Thanks,
-- 
kou


[C++] Kernel function registry evolution

2022-06-01 Thread Weston Pace
We've had some evidence for a while now that the kernel functions
suffer from an overhead problem that prevents us from effectively
utilizing cache.  The latest and greatest evidence of this might be
[1].  A number of people have made some very interesting suggestions
that I think could really cut down on the overhead (e.g. preallocated
buffers).  However, whenever we start a discussion on implementation
it ends up getting bogged down because there is a lot of existing code
here and a massive refactor would be too difficult.

I'd like to propose we add a second kernel function registry.  There
doesn't need to be any user facing API change.  We could probably use
an approach like [2] to proxy to the old function registry when the
newer registry doesn't contain the asked-for function.  This would
allow us to focus on creating an efficient function registry without
having to worry about refactoring the existing kernels all at once.

Once we are happy with the new registry we can start to migrate the
existing kernel functions over to the new registry.  I don't expect
there will need to be a lot of change to the existing kernel functions
but whatever change is there can be done incrementally.

Does this seem like a good approach?  Am I missing something or does
anyone know of a better way to fix the existing implementation?

The main risk I can see is that we don't end up completing the
migration and end up maintaining two registries forever.  However, we
have enough interested people here at Voltron Data that I feel
confident we can get this pushed through.

[1] https://github.com/apache/arrow/pull/13179
[2] https://github.com/apache/arrow/pull/13252


Re: [DISC] Improving Arrow's database support

2022-06-01 Thread David Li
I've set up the new repo and enabled issues. I still need to get things 
building independently of Arrow, but now adbc.h is self-contained and the 
"driver manager" being prototyped can also be built and used independently of 
Arrow.

On Wed, Jun 1, 2022, at 13:55, David Li wrote:
> Wes: thanks! I'll move things over and update the list.
>
> Gavin: I mean more that ADBC won't support every little feature in 
> JDBC/ODBC, or won't necessarily make it easy to support certain things 
> (e.g. updating a single row in a ResultSet). But it's not that OLTP is 
> taboo, it's just not what is being optimized for. 
>
> For instance it would be nice to eventually have JDBC/ODBC drivers that 
> can wrap ADBC in much the same way that Dremio is working on a JDBC 
> driver for Flight SQL. But especially in the near term, ADBC just won't 
> have the feature set to make that possible.
>
> What sorts of use cases were you thinking about, though?
>
> On Wed, Jun 1, 2022, at 13:18, Gavin Ray wrote:
>> This sounds great, but I had one question:
>>
>> Read the initial ADBC proposal and it mentioned that OLTP was not a
>> targeted usecase
>> If this work is intended to take on the role of a sort of standard ABI/SDK,
>> does that mean that building OLTP-oriented drivers/tooling with it is off
>> the table?
>>
>> On Wed, Jun 1, 2022 at 11:11 AM Wes McKinney  wrote:
>>
>>> I went ahead and created
>>>
>>> https://github.com/apache/arrow-adbc
>>>
>>> I directed issue comments / PRs to issues@
>>>
>>> On Tue, May 31, 2022 at 8:49 PM Wes McKinney  wrote:
>>> >
>>> > I think spinning up a new repository while this exploratory work
>>> > progresses is a fine idea — perhaps apache/arrow-dbc / arrow-adbc or
>>> > similar (the name can always be changed later). That would bubble up
>>> > discussions in a way that's easier for people to follow (watching your
>>> > fork isn't ideal!). If it makes sense to move code later, it can
>>> > always be moved.
>>> >
>>> >
>>> > On Tue, May 31, 2022 at 1:02 PM David Li  wrote:
>>> > >
>>> > > Some updates:
>>> > >
>>> > > The proposal is being updated based on feedback from contributors to
>>> DuckDB and DBI. We've been using GitHub issues on the fork to discuss the
>>> API design and how to implement data ingestion/bound parameters:
>>> https://github.com/lidavidm/arrow/issues
>>> > >
>>> > > If anyone has suggestions/ideas/questions, or would like to jump in as
>>> well, please feel free to chime in there too.
>>> > >
>>> > > I have also been wondering if we might want to plan to split off a new
>>> repo for this work? In particular, some components might be easiest to
>>> consume if they didn't also have a hard dependency on the Arrow C++
>>> libraries. And we could use the repo to manage contributed drivers (some of
>>> which may individually leverage the Arrow libraries). Of course,
>>> maintaining a parallel build system, setting up releases, etc. is also a
>>> lot of work.
>>> > >
>>> > > -David
>>> > >
>>> > > On Tue, Apr 26, 2022, at 15:01, Wes McKinney wrote:
>>> > > > I don't have major new things to add on this topic except that I've
>>> > > > long had the aspiration of creating something like Python's DBAPI 2.0
>>> > > > [1] at the C or C++ level to enable a measure of API standardization
>>> > > > for Arrow-native read/write interfaces with database drivers. It
>>> seems
>>> > > > like a natural complement to the wire-protocol standardization work
>>> > > > with FlightSQL. I had previously brought in some code that I had
>>> > > > worked on related to interfacing with the HiveServer2 wire protocol
>>> > > > (for Hive and Impala, or other HS2-compatible query engines) with the
>>> > > > intention of prototyping but never was able to find the time.
>>> > > >
>>> > > > From an external messaging standpoint, one thing that will be
>>> > > > important is to assert that this is not intended to displace or
>>> > > > deprecate ODBC or JDBC drivers. In fact, I would hope that the
>>> > > > Arrow-native APIs could be added somehow to existing driver libraries
>>> > > > where it made sense, so that if they are used in an application that
>>> > > > uses Arrow, they can opt in to using the Arrow-based APIs for getting
>>> > > > result sets, or doing bulk inserts, etc.
>>> > > >
>>> > > > [1]: https://peps.python.org/pep-0249/
>>> > > >
>>> > > > On Tue, Apr 26, 2022 at 12:36 PM Antoine Pitrou 
>>> wrote:
>>> > > >>
>>> > > >>
>>> > > >> Do we want something more flexible than dlopen() and runtime symbol
>>> > > >> lookup (a mechanism which constrains the way you can organize and
>>> > > >> distribute drivers)?
>>> > > >>
>>> > > >> For example, perhaps we could expose an API struct of function
>>> pointers
>>> > > >> that could be obtained through driver-specific means.
>>> > > >>
>>> > > >>
>>> > > >> Le 26/04/2022 à 18:29, David Li a écrit :
>>> > > >> > Hello,
>>> > > >> >
>>> > > >> > In light of recent efforts around Flight SQL, projects like pgeon
>>> [1], and long-standing tickets/discussions

Re: [Dev] Switch to token authentication for archery & merge script

2022-06-01 Thread Wes McKinney
hi Jacob — this sounds very reasonable and fixes a rough edge for
maintainers running into captcha issues.

Thanks
Wes

On Wed, Jun 1, 2022 at 6:44 AM Jacob Wujciak  wrote:
>
> Hello Everyone,
>
> I would like to propose that we switch from basic authentication with JIRA
> in the merge script and archery to PAT based token authentication.
>
> Basic authentication is deprecated in Jira Cloud [1] and PATs remove the
> need to save your password in clear text (e.g. config file), PATS are
> easily created [2], PATs can be revoked and an expiry date set. Token
> authentication also does not trigger a captcha. The changes needed in
> workflow and scripts are minimal and easily outweighed by the advantages.
>
> I have created a PR with the necessary changes [3].
>
> Best,
> Jacob
>
> [1]:
> https://developer.atlassian.com/cloud/jira/platform/deprecation-notice-basic-auth-and-cookie-based-auth/
> [2]:
> https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html
> [3]: https://github.com/apache/arrow/pull/13283


Re: [DISC] Improving Arrow's database support

2022-06-01 Thread David Li
Wes: thanks! I'll move things over and update the list.

Gavin: I mean more that ADBC won't support every little feature in JDBC/ODBC, 
or won't necessarily make it easy to support certain things (e.g. updating a 
single row in a ResultSet). But it's not that OLTP is taboo, it's just not what 
is being optimized for. 

For instance it would be nice to eventually have JDBC/ODBC drivers that can 
wrap ADBC in much the same way that Dremio is working on a JDBC driver for 
Flight SQL. But especially in the near term, ADBC just won't have the feature 
set to make that possible.

What sorts of use cases were you thinking about, though?

On Wed, Jun 1, 2022, at 13:18, Gavin Ray wrote:
> This sounds great, but I had one question:
>
> Read the initial ADBC proposal and it mentioned that OLTP was not a
> targeted usecase
> If this work is intended to take on the role of a sort of standard ABI/SDK,
> does that mean that building OLTP-oriented drivers/tooling with it is off
> the table?
>
> On Wed, Jun 1, 2022 at 11:11 AM Wes McKinney  wrote:
>
>> I went ahead and created
>>
>> https://github.com/apache/arrow-adbc
>>
>> I directed issue comments / PRs to issues@
>>
>> On Tue, May 31, 2022 at 8:49 PM Wes McKinney  wrote:
>> >
>> > I think spinning up a new repository while this exploratory work
>> > progresses is a fine idea — perhaps apache/arrow-dbc / arrow-adbc or
>> > similar (the name can always be changed later). That would bubble up
>> > discussions in a way that's easier for people to follow (watching your
>> > fork isn't ideal!). If it makes sense to move code later, it can
>> > always be moved.
>> >
>> >
>> > On Tue, May 31, 2022 at 1:02 PM David Li  wrote:
>> > >
>> > > Some updates:
>> > >
>> > > The proposal is being updated based on feedback from contributors to
>> DuckDB and DBI. We've been using GitHub issues on the fork to discuss the
>> API design and how to implement data ingestion/bound parameters:
>> https://github.com/lidavidm/arrow/issues
>> > >
>> > > If anyone has suggestions/ideas/questions, or would like to jump in as
>> well, please feel free to chime in there too.
>> > >
>> > > I have also been wondering if we might want to plan to split off a new
>> repo for this work? In particular, some components might be easiest to
>> consume if they didn't also have a hard dependency on the Arrow C++
>> libraries. And we could use the repo to manage contributed drivers (some of
>> which may individually leverage the Arrow libraries). Of course,
>> maintaining a parallel build system, setting up releases, etc. is also a
>> lot of work.
>> > >
>> > > -David
>> > >
>> > > On Tue, Apr 26, 2022, at 15:01, Wes McKinney wrote:
>> > > > I don't have major new things to add on this topic except that I've
>> > > > long had the aspiration of creating something like Python's DBAPI 2.0
>> > > > [1] at the C or C++ level to enable a measure of API standardization
>> > > > for Arrow-native read/write interfaces with database drivers. It
>> seems
>> > > > like a natural complement to the wire-protocol standardization work
>> > > > with FlightSQL. I had previously brought in some code that I had
>> > > > worked on related to interfacing with the HiveServer2 wire protocol
>> > > > (for Hive and Impala, or other HS2-compatible query engines) with the
>> > > > intention of prototyping but never was able to find the time.
>> > > >
>> > > > From an external messaging standpoint, one thing that will be
>> > > > important is to assert that this is not intended to displace or
>> > > > deprecate ODBC or JDBC drivers. In fact, I would hope that the
>> > > > Arrow-native APIs could be added somehow to existing driver libraries
>> > > > where it made sense, so that if they are used in an application that
>> > > > uses Arrow, they can opt in to using the Arrow-based APIs for getting
>> > > > result sets, or doing bulk inserts, etc.
>> > > >
>> > > > [1]: https://peps.python.org/pep-0249/
>> > > >
>> > > > On Tue, Apr 26, 2022 at 12:36 PM Antoine Pitrou 
>> wrote:
>> > > >>
>> > > >>
>> > > >> Do we want something more flexible than dlopen() and runtime symbol
>> > > >> lookup (a mechanism which constrains the way you can organize and
>> > > >> distribute drivers)?
>> > > >>
>> > > >> For example, perhaps we could expose an API struct of function
>> pointers
>> > > >> that could be obtained through driver-specific means.
>> > > >>
>> > > >>
>> > > >> Le 26/04/2022 à 18:29, David Li a écrit :
>> > > >> > Hello,
>> > > >> >
>> > > >> > In light of recent efforts around Flight SQL, projects like pgeon
>> [1], and long-standing tickets/discussions about database support in Arrow
>> [2], it seems there's an opportunity to define standard database interfaces
>> for Arrow that could unify these efforts. So we've put together a proposal
>> for "ADBC", a common Arrow-based database client API:
>> > > >> >
>> > > >> >
>> https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/edit#heading=h.r6o6j2navi4c
>> > > >> >
>>

Re: [Discuss][Java] macOS minimum requirements

2022-06-01 Thread Jonathan Keane
This isn't Java related directly, but for the R bindings we have to
support at least 10.13.6 to be on CRAN, so bumping up to 10.13 would
be fine for that too.

-Jon

On Wed, Jun 1, 2022 at 9:24 AM Antoine Pitrou  wrote:
>
>
> Sorry, I put "C++" in the title but this really affects Java via JNI.
>
>
> Le 01/06/2022 à 16:22, Antoine Pitrou a écrit :
> >
> >
> > Hello,
> >
> > The topic came up recently of bumping up our minimal macOS requirements
> > from 10.11 to 10.13 (*).  Do people have any particular concerns about this?
> >
> > (*) https://github.com/apache/arrow/pull/13157#issuecomment-1143670152
> >
> > Regards
> >
> > Antoine.


Re: [DISC] Improving Arrow's database support

2022-06-01 Thread Gavin Ray
This sounds great, but I had one question:

Read the initial ADBC proposal and it mentioned that OLTP was not a
targeted usecase
If this work is intended to take on the role of a sort of standard ABI/SDK,
does that mean that building OLTP-oriented drivers/tooling with it is off
the table?

On Wed, Jun 1, 2022 at 11:11 AM Wes McKinney  wrote:

> I went ahead and created
>
> https://github.com/apache/arrow-adbc
>
> I directed issue comments / PRs to issues@
>
> On Tue, May 31, 2022 at 8:49 PM Wes McKinney  wrote:
> >
> > I think spinning up a new repository while this exploratory work
> > progresses is a fine idea — perhaps apache/arrow-dbc / arrow-adbc or
> > similar (the name can always be changed later). That would bubble up
> > discussions in a way that's easier for people to follow (watching your
> > fork isn't ideal!). If it makes sense to move code later, it can
> > always be moved.
> >
> >
> > On Tue, May 31, 2022 at 1:02 PM David Li  wrote:
> > >
> > > Some updates:
> > >
> > > The proposal is being updated based on feedback from contributors to
> DuckDB and DBI. We've been using GitHub issues on the fork to discuss the
> API design and how to implement data ingestion/bound parameters:
> https://github.com/lidavidm/arrow/issues
> > >
> > > If anyone has suggestions/ideas/questions, or would like to jump in as
> well, please feel free to chime in there too.
> > >
> > > I have also been wondering if we might want to plan to split off a new
> repo for this work? In particular, some components might be easiest to
> consume if they didn't also have a hard dependency on the Arrow C++
> libraries. And we could use the repo to manage contributed drivers (some of
> which may individually leverage the Arrow libraries). Of course,
> maintaining a parallel build system, setting up releases, etc. is also a
> lot of work.
> > >
> > > -David
> > >
> > > On Tue, Apr 26, 2022, at 15:01, Wes McKinney wrote:
> > > > I don't have major new things to add on this topic except that I've
> > > > long had the aspiration of creating something like Python's DBAPI 2.0
> > > > [1] at the C or C++ level to enable a measure of API standardization
> > > > for Arrow-native read/write interfaces with database drivers. It
> seems
> > > > like a natural complement to the wire-protocol standardization work
> > > > with FlightSQL. I had previously brought in some code that I had
> > > > worked on related to interfacing with the HiveServer2 wire protocol
> > > > (for Hive and Impala, or other HS2-compatible query engines) with the
> > > > intention of prototyping but never was able to find the time.
> > > >
> > > > From an external messaging standpoint, one thing that will be
> > > > important is to assert that this is not intended to displace or
> > > > deprecate ODBC or JDBC drivers. In fact, I would hope that the
> > > > Arrow-native APIs could be added somehow to existing driver libraries
> > > > where it made sense, so that if they are used in an application that
> > > > uses Arrow, they can opt in to using the Arrow-based APIs for getting
> > > > result sets, or doing bulk inserts, etc.
> > > >
> > > > [1]: https://peps.python.org/pep-0249/
> > > >
> > > > On Tue, Apr 26, 2022 at 12:36 PM Antoine Pitrou 
> wrote:
> > > >>
> > > >>
> > > >> Do we want something more flexible than dlopen() and runtime symbol
> > > >> lookup (a mechanism which constrains the way you can organize and
> > > >> distribute drivers)?
> > > >>
> > > >> For example, perhaps we could expose an API struct of function
> pointers
> > > >> that could be obtained through driver-specific means.
> > > >>
> > > >>
> > > >> Le 26/04/2022 à 18:29, David Li a écrit :
> > > >> > Hello,
> > > >> >
> > > >> > In light of recent efforts around Flight SQL, projects like pgeon
> [1], and long-standing tickets/discussions about database support in Arrow
> [2], it seems there's an opportunity to define standard database interfaces
> for Arrow that could unify these efforts. So we've put together a proposal
> for "ADBC", a common Arrow-based database client API:
> > > >> >
> > > >> >
> https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/edit#heading=h.r6o6j2navi4c
> > > >> >
> > > >> > A common API and implementations could help combine/simplify
> client-side projects like pgeon, or what DBI is considering [3], and help
> them take advantage of developments like Flight SQL and existing columnar
> APIs.
> > > >> >
> > > >> > We'd appreciate any feedback. (Comments should be open, please
> let me know if not.)
> > > >> >
> > > >> > [1]: https://github.com/0x0L/pgeon
> > > >> > [2]: https://issues.apache.org/jira/browse/ARROW-11670
> > > >> > [3]: https://github.com/r-dbi/dbi3/issues/48
> > > >> >
> > > >> > Thanks,
> > > >> > David
>


Re: [DISC] Improving Arrow's database support

2022-06-01 Thread Wes McKinney
I went ahead and created

https://github.com/apache/arrow-adbc

I directed issue comments / PRs to issues@

On Tue, May 31, 2022 at 8:49 PM Wes McKinney  wrote:
>
> I think spinning up a new repository while this exploratory work
> progresses is a fine idea — perhaps apache/arrow-dbc / arrow-adbc or
> similar (the name can always be changed later). That would bubble up
> discussions in a way that's easier for people to follow (watching your
> fork isn't ideal!). If it makes sense to move code later, it can
> always be moved.
>
>
> On Tue, May 31, 2022 at 1:02 PM David Li  wrote:
> >
> > Some updates:
> >
> > The proposal is being updated based on feedback from contributors to DuckDB 
> > and DBI. We've been using GitHub issues on the fork to discuss the API 
> > design and how to implement data ingestion/bound parameters: 
> > https://github.com/lidavidm/arrow/issues
> >
> > If anyone has suggestions/ideas/questions, or would like to jump in as 
> > well, please feel free to chime in there too.
> >
> > I have also been wondering if we might want to plan to split off a new repo 
> > for this work? In particular, some components might be easiest to consume 
> > if they didn't also have a hard dependency on the Arrow C++ libraries. And 
> > we could use the repo to manage contributed drivers (some of which may 
> > individually leverage the Arrow libraries). Of course, maintaining a 
> > parallel build system, setting up releases, etc. is also a lot of work.
> >
> > -David
> >
> > On Tue, Apr 26, 2022, at 15:01, Wes McKinney wrote:
> > > I don't have major new things to add on this topic except that I've
> > > long had the aspiration of creating something like Python's DBAPI 2.0
> > > [1] at the C or C++ level to enable a measure of API standardization
> > > for Arrow-native read/write interfaces with database drivers. It seems
> > > like a natural complement to the wire-protocol standardization work
> > > with FlightSQL. I had previously brought in some code that I had
> > > worked on related to interfacing with the HiveServer2 wire protocol
> > > (for Hive and Impala, or other HS2-compatible query engines) with the
> > > intention of prototyping but never was able to find the time.
> > >
> > > From an external messaging standpoint, one thing that will be
> > > important is to assert that this is not intended to displace or
> > > deprecate ODBC or JDBC drivers. In fact, I would hope that the
> > > Arrow-native APIs could be added somehow to existing driver libraries
> > > where it made sense, so that if they are used in an application that
> > > uses Arrow, they can opt in to using the Arrow-based APIs for getting
> > > result sets, or doing bulk inserts, etc.
> > >
> > > [1]: https://peps.python.org/pep-0249/
> > >
> > > On Tue, Apr 26, 2022 at 12:36 PM Antoine Pitrou  
> > > wrote:
> > >>
> > >>
> > >> Do we want something more flexible than dlopen() and runtime symbol
> > >> lookup (a mechanism which constrains the way you can organize and
> > >> distribute drivers)?
> > >>
> > >> For example, perhaps we could expose an API struct of function pointers
> > >> that could be obtained through driver-specific means.
> > >>
> > >>
> > >> Le 26/04/2022 à 18:29, David Li a écrit :
> > >> > Hello,
> > >> >
> > >> > In light of recent efforts around Flight SQL, projects like pgeon [1], 
> > >> > and long-standing tickets/discussions about database support in Arrow 
> > >> > [2], it seems there's an opportunity to define standard database 
> > >> > interfaces for Arrow that could unify these efforts. So we've put 
> > >> > together a proposal for "ADBC", a common Arrow-based database client 
> > >> > API:
> > >> >
> > >> > https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/edit#heading=h.r6o6j2navi4c
> > >> >
> > >> > A common API and implementations could help combine/simplify 
> > >> > client-side projects like pgeon, or what DBI is considering [3], and 
> > >> > help them take advantage of developments like Flight SQL and existing 
> > >> > columnar APIs.
> > >> >
> > >> > We'd appreciate any feedback. (Comments should be open, please let me 
> > >> > know if not.)
> > >> >
> > >> > [1]: https://github.com/0x0L/pgeon
> > >> > [2]: https://issues.apache.org/jira/browse/ARROW-11670
> > >> > [3]: https://github.com/r-dbi/dbi3/issues/48
> > >> >
> > >> > Thanks,
> > >> > David


Re: [C++] Adding Run-Length Encoding to Arrow

2022-06-01 Thread Neal Richardson
Would it make sense to make a draft PR with your branch so that folks can
comment on specific parts of it?

Neal

On Wed, Jun 1, 2022 at 10:20 AM Tobias Zagorni 
wrote:

> Am Dienstag, dem 31.05.2022 um 12:41 -0700 schrieb Micah Kornfield:
> >
> > - Should we allow multiple runs of the same value following each
> > other?
> > > Otherwise we would either need a pass to correct this after a lot
> > > of
> > > operations, or make RLE-aware versions of thier kernels.
> >
> > Is there any benefit you see in disallowing it?
>
> Some operations would be simpler. The one I can think of head is
> equality check between the whole arrays, it could just memcpy the run
> length buffer and compare the child arrays like you would compare any
> array of the child type. Probably not worth the complexity of
> disallowing it, but maybe someone knows more important cases.
>
> >
> Best,
> Tobias
>


Re: [Discuss][Java] macOS minimum requirements

2022-06-01 Thread Antoine Pitrou



Sorry, I put "C++" in the title but this really affects Java via JNI.


Le 01/06/2022 à 16:22, Antoine Pitrou a écrit :



Hello,

The topic came up recently of bumping up our minimal macOS requirements
from 10.11 to 10.13 (*).  Do people have any particular concerns about this?

(*) https://github.com/apache/arrow/pull/13157#issuecomment-1143670152

Regards

Antoine.


[Discuss][C++] macOS minimum requirements

2022-06-01 Thread Antoine Pitrou




Hello,

The topic came up recently of bumping up our minimal macOS requirements 
from 10.11 to 10.13 (*).  Do people have any particular concerns about this?


(*) https://github.com/apache/arrow/pull/13157#issuecomment-1143670152

Regards

Antoine.


Re: [C++] Adding Run-Length Encoding to Arrow

2022-06-01 Thread Tobias Zagorni
Am Dienstag, dem 31.05.2022 um 12:41 -0700 schrieb Micah Kornfield:
> 
> - Should we allow multiple runs of the same value following each
> other?
> > Otherwise we would either need a pass to correct this after a lot
> > of
> > operations, or make RLE-aware versions of thier kernels.
> 
> Is there any benefit you see in disallowing it?

Some operations would be simpler. The one I can think of head is
equality check between the whole arrays, it could just memcpy the run
length buffer and compare the child arrays like you would compare any
array of the child type. Probably not worth the complexity of
disallowing it, but maybe someone knows more important cases.

> 
Best,
Tobias


[Dev] Switch to token authentication for archery & merge script

2022-06-01 Thread Jacob Wujciak
Hello Everyone,

I would like to propose that we switch from basic authentication with JIRA
in the merge script and archery to PAT based token authentication.

Basic authentication is deprecated in Jira Cloud [1] and PATs remove the
need to save your password in clear text (e.g. config file), PATS are
easily created [2], PATs can be revoked and an expiry date set. Token
authentication also does not trigger a captcha. The changes needed in
workflow and scripts are minimal and easily outweighed by the advantages.

I have created a PR with the necessary changes [3].

Best,
Jacob

[1]:
https://developer.atlassian.com/cloud/jira/platform/deprecation-notice-basic-auth-and-cookie-based-auth/
[2]:
https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html
[3]: https://github.com/apache/arrow/pull/13283