Re: Process to get arrow-JS bugfixes merged in?

2022-12-01 Thread Sutou Kouhei
Hi,

I requested review to JS developers.

Thanks,
-- 
kou

In <588ac3af-81ce-4414-9d10-5ad936fe4...@app.fastmail.com>
  "Process to get arrow-JS bugfixes merged in?" on Thu, 01 Dec 2022 18:38:44 
-0800,
  "Thomas Sarlandie"  wrote:

> Hi,
> 
> I opened an issue a few weeks ago (in JIRA): 
> https://issues.apache.org/jira/browse/ARROW-18247
> 
> I also submitted a pull-request to fix the bug: 
> https://github.com/apache/arrow/pull/14587
> 
> What is the best way (this email?) to get one of the javascript maintainers 
> attention to look at it and try to get this included in the next release of 
> apache arrow?
> 
> thanks!
> thomas


Process to get arrow-JS bugfixes merged in?

2022-12-01 Thread Thomas Sarlandie
Hi,

I opened an issue a few weeks ago (in JIRA): 
https://issues.apache.org/jira/browse/ARROW-18247

I also submitted a pull-request to fix the bug: 
https://github.com/apache/arrow/pull/14587

What is the best way (this email?) to get one of the javascript maintainers 
attention to look at it and try to get this included in the next release of 
apache arrow?

thanks!
thomas


ASF Infra Survey

2022-12-01 Thread Neal Richardson
ASF Infra is running a survey and would like feedback from any committer.
Cross-posting here in case folks aren't on the infra mailing list. If you
have opinions you would like to share, please see the link below.

Neal

-- Forwarded message -
From: Chris Thistlethwaite 
Date: Thu, Dec 1, 2022 at 4:49 PM
Subject: Infra 2022 Survey
To: 


We're going to start using regular surveys to get to understand the
heart of the problems that our projects and podlings face.

Our plan with surveys is two fold.

First, to get a finger on the pulse of projects and a "2022 year in
review" baseline. Being this is the first time we're trying this it's
likely that future surveys will contain different questions.

Secondly, we'll be sending out another survey on a regular basis,
something like every six months or so just to keep a feedback loop
going.

Surveys will be anonymous and results will be posted on the Infra blog.
Depending on timing, they will also be discussed in the roundtables if
the data is pertinent.

The first one is posted: https://infra.apache.org/surveys/survey-1.html

This survey will be open until the end of the month 12/30/2022. It
isn't restricted to PMCs only, we'd really like feedback from any
committer.

All the surveys will be short, likely less than 10 questions. The data
will be shared on the blog once the survey has been closed and the
results analyzed.

Thank you,
Chris T.
#asfinfra


Re: DISCUSS: [FlightSQL] Catalog support

2022-12-01 Thread David Li
Hey James, thanks for putting this up.

Inline:

> The suggestion is to make this part of Flight as an
> optional feature, rather than Flight SQL due to its applicability outside
> of just database access.

Which uses do you see? I see statefulness as a general antipattern here, so I'm 
wary of introducing it beyond where we need it. 

> - The Flight client supplies a New-Session header which has key-value pairs
> for initial session options. This header can be applied to any RPC call,
> but logically should be the first one the client makes.

Handshake already effectively serves as this RPC - maybe we could extend it? (I 
also see Handshake as an antipattern because it's a stateful auth mechanism.)

Should the session timeout/be on a lease? (gRPC doesn't really give the server 
a way to track the persistence of a particular client connection.)

> It's a bit asymmetric that creating a new session is done by applying a
> header, but closing a session is an RPC call. This was so that session
> creation doesn't introduce another round trip before the first real data
> request. If there's a way to batch RPC calls it might be better to make
> session creation an RPC call.

Is this a worrisome amount of overhead? 

Unfortunately gRPC doesn't batch RPC calls, but RPC calls on the same client 
generally share the same TCP connection (modulo load balancing behavior, but 
presumably that is not enabled if you want persistent sessions).

On the implementation side, I'd like to avoid baking this in too deeply if at 
all possible. Ideally it'd be implemented entirely as middleware, possibly 
making use of an interface so applications can override the session storage 
(hashtable, Redis, etcd, etc.)

> Just to chime in on this, one thing I'm curious about is whether there
> will be support for user-defined catalog/schema hierarchy depth?

Gavin, for ADBC we discussed adding a delimiter to the catalog name to handle 
this case - maybe we can handle this by adding a property for the delimiter to 
SqlInfo?

> https://github.com/influxdata/influxdb_iox/issues/6102

Andrew, do we need to look into adding more metadata to indicate different 
query languages? (It's quite a shame that we named this Flight SQL at this 
point...) 

On Thu, Dec 1, 2022, at 08:49, Andrew Lamb wrote:
> Sorry for the late reply -- thank you James and David for this discussion.
>
> I agree that adding Catalog support would be a valuable addition to Flight
> SQL, and it recently came up as we begin to implement Flight SQL in
> InfluxDB IOx [1].
>
>> - A standard URI scheme for Flight SQL that can be used by multiple
> client APIs (JDBC, ADBC, etc.)
>
> I agree this would be very valuable, along with a standard way (ideally
> with HTTP headers) to send this information as part of the FlightSQL gRPC
> requests.
>
>> I'd suggest we define session management features explicitly in Flight
> (while being optional).
>
> I agree it is critical that server-side state is not required to implement
> FlightSQL. Stateful connections would likely complicate deploying FlightSQL
> in distributed systems. I suggest it should be possible to implement any
> session management features by sending the entire session state with the
> request, if desired.
>
> I don't have a strong opinion about the merits of including explicit
> session management features in FlightSQL. It seems to me that keeping the
> API surface of FlightSQL minimal and implementation flexibility maximal
> should be the default. However, if JDBC/ODBC driver compatibility would be
> improved with explicit state management APIs, then adding them to FlightSQL
> seems like a good idea to me.
>
> Thanks again -- it is amazing to hit some issue in design and then find out
> the Arrow community is already hard at work on a solution.
>
> Andrew
>
> [1] https://github.com/influxdata/influxdb_iox/issues/6102
>
> On Wed, Nov 30, 2022 at 7:17 PM Gavin Ray  wrote:
>
>> Just to chime in on this, one thing I'm curious about is whether there
>> will be support for user-defined catalog/schema hierarchy depth?
>>
>> This comment that James made does seem reasonable to me
>> > scheme://:/path-1/path-2/.../path-n
>>
>> Trino/Presto does a similar thing (jdbc:trino://localhost:8080/tpch/sf1)
>>
>> At Hasura, what we do is have an alias "FullyQualifiedName" which is
>> just "Array"
>> and the identifier to some element in a data source is always
>> fully-qualified:
>>
>> https://github.com/hasura/graphql-engine/tree/master/dc-agents#schema
>>
>> ["postgres_1", "db1", "schema2", "my_table", "col_a"]
>> ["mongo", "db1",  "collection_a", "field_a"]
>> ["csv_adapter", "myfile.csv", "col_x"]
>>
>> On Wed, Nov 30, 2022 at 6:31 PM James Duong
>>  wrote:
>> >
>> > Our current convention of sending connection properties as headers with
>> > every request has the benefit of making statefulness optional, but has
>> the
>> > drawback of sending redundant, unused properties on requests after the
>> > first, which increases the payload size 

[RELEASE] Preparing for Arrow Release version 11.0.0

2022-12-01 Thread Raúl Cumplido
Hi,

I am sending this email earlier than usual to let everyone know about the
current plans for 11.0.0 ahead of New Year's possible festivities so
everyone can plan ahead and raise any possible concern around the planned
dates.

The plan is to start preparing the 11.0.0 release around mid January (~6
weeks). I would propose Monday 16th of January as the release code freeze
date. A confluence page was made to track the progress for the 11.0.0
release [1]. Take into account that with the GitHub migration some of the
issues are currently tracked on the GitHub milestone for 11.0.0 [2].

Thanks everybody for the great work!

Thank you,
Raúl

[1] https://cwiki.apache.org/confluence/display/ARROW/Arrow+11.0.0+Release
[2] https://github.com/apache/arrow/milestone/1


Re: DISCUSS: [FlightSQL] Catalog support

2022-12-01 Thread Andrew Lamb
Sorry for the late reply -- thank you James and David for this discussion.

I agree that adding Catalog support would be a valuable addition to Flight
SQL, and it recently came up as we begin to implement Flight SQL in
InfluxDB IOx [1].

> - A standard URI scheme for Flight SQL that can be used by multiple
client APIs (JDBC, ADBC, etc.)

I agree this would be very valuable, along with a standard way (ideally
with HTTP headers) to send this information as part of the FlightSQL gRPC
requests.

> I'd suggest we define session management features explicitly in Flight
(while being optional).

I agree it is critical that server-side state is not required to implement
FlightSQL. Stateful connections would likely complicate deploying FlightSQL
in distributed systems. I suggest it should be possible to implement any
session management features by sending the entire session state with the
request, if desired.

I don't have a strong opinion about the merits of including explicit
session management features in FlightSQL. It seems to me that keeping the
API surface of FlightSQL minimal and implementation flexibility maximal
should be the default. However, if JDBC/ODBC driver compatibility would be
improved with explicit state management APIs, then adding them to FlightSQL
seems like a good idea to me.

Thanks again -- it is amazing to hit some issue in design and then find out
the Arrow community is already hard at work on a solution.

Andrew

[1] https://github.com/influxdata/influxdb_iox/issues/6102

On Wed, Nov 30, 2022 at 7:17 PM Gavin Ray  wrote:

> Just to chime in on this, one thing I'm curious about is whether there
> will be support for user-defined catalog/schema hierarchy depth?
>
> This comment that James made does seem reasonable to me
> > scheme://:/path-1/path-2/.../path-n
>
> Trino/Presto does a similar thing (jdbc:trino://localhost:8080/tpch/sf1)
>
> At Hasura, what we do is have an alias "FullyQualifiedName" which is
> just "Array"
> and the identifier to some element in a data source is always
> fully-qualified:
>
> https://github.com/hasura/graphql-engine/tree/master/dc-agents#schema
>
> ["postgres_1", "db1", "schema2", "my_table", "col_a"]
> ["mongo", "db1",  "collection_a", "field_a"]
> ["csv_adapter", "myfile.csv", "col_x"]
>
> On Wed, Nov 30, 2022 at 6:31 PM James Duong
>  wrote:
> >
> > Our current convention of sending connection properties as headers with
> > every request has the benefit of making statefulness optional, but has
> the
> > drawback of sending redundant, unused properties on requests after the
> > first, which increases the payload size unnecessarily.
> >
> > I'd suggest we define session management features explicitly in Flight
> > (while being optional). The suggestion is to make this part of Flight as
> an
> > optional feature, rather than Flight SQL due to its applicability outside
> > of just database access.
> >
> > Creating a session:
> > - The Flight client supplies a New-Session header which has key-value
> pairs
> > for initial session options. This header can be applied to any RPC call,
> > but logically should be the first one the client makes.
> > - The server should send a Set-Cookie header back containing some
> > server-side representation of the session that the client can use in
> > subsequent requests.
> > - The path specified in the URI is sent as a "Catalog" session option.
> >
> > Modifying session options:
> > - A separate RPC call that takes in a Stream representing
> > each session option that is being modified and returns a stream of
> statuses
> > to indicate if the setting change was accepted.
> > - This RPC call is only valid when the Cookie header is used.
> > - It is up to the server to define if a failed session property change is
> > fatal or if other properties can continue to be set.
> >
> > Closing a session:
> > - A separate RPC call that tells the server to drop the session specified
> > by the Cookie header.
> >
> > Notes:
> > A Flight SQL client would check if session management RPCs are supported
> > through a new GetSqlInfo property. A Flight client doesn't have a way to
> do
> > this generically, but there could be an application-specific RPC or
> header
> > that reports this metadata.
> >
> > The O/JDBC and ADBC drivers would need to be updated to programmatically
> > check for session management RPCs. If unsupported, then use the old
> > behavior of sending all properties as headers with each request. If
> > supported, make use of the New-Session header and drop the session when
> > closing the client-side connection.
> >
> > It's a bit asymmetric that creating a new session is done by applying a
> > header, but closing a session is an RPC call. This was so that session
> > creation doesn't introduce another round trip before the first real data
> > request. If there's a way to batch RPC calls it might be better to make
> > session creation an RPC call.
> >
> > On Tue, Nov 22, 2022 at 3:16 PM David Li  wrote:
> >
> > > It sounds 

Re: [DISCUSS] JSON Canonical Extension Type

2022-12-01 Thread Antoine Pitrou



HOCON is a superset of JSON, so I'm not sure making it an extension type 
based it on JSON would be a good idea.



Le 01/12/2022 à 06:23, Micah Kornfield a écrit :


Can a logical extension be based on another logical extension?


Potentially but this is mostly an implementation details, each type should
have their own specification IMO.

HOCON support might be nice..

I'm not sure if this is common enough to warrant a canonical type within
Arrow but you are welcome to propose something if you would like.

Cheers,
Micah

On Mon, Nov 28, 2022 at 11:55 AM Lee, David 
wrote:


Can a logical extension be based on another logical extension?

HOCON support might be nice..

-Original Message-
From: Micah Kornfield 
Sent: Monday, November 28, 2022 11:50 AM
To: dev@arrow.apache.org
Subject: Re: [DISCUSS] JSON Canonical Extension Type

External Email: Use caution with links and attachments


This seems like a reasonable definition to me.  Since there hasn't been
much feedback, I think maybe following through an implementation + this
description in a PR would be the next steps.  If there isn't further
feedback on this, once the PR is up we can have try to vote (which might
bring up some more feedback, but hopefully wouldn't cause too much
implementation churn).

Thanks,
Micah

On Thu, Nov 17, 2022 at 3:58 PM Pradeep Gollakota
 wrote:


Hi folks!

I put together this specification for canonicalizing the JSON type in
Arrow.

## Introduction
JSON is a widely used text based data interchange format. There are
many use cases where a user has a column whose contents are a JSON
encoded string. BigQuery's [JSON Type][1] and Parquet’s [JSON Logical
Type][2] are two such examples.

The JSON specification is defined in [RFC-8259][3]. However, many of
the most popular parsers support non standard extensions. Examples of
non standard extensions to JSON include comments, unquoted keys,
trailing commas, etc.

## Extension Specification
* The name of the extension is `arrow.json`
* The storage type of the extension is `utf8`
* The extension type has no parameters
* The metadata MUST be either empty or a valid JSON object
 - There is no canonical metadata
 - Implementations MAY include implementation-specific metadata by
using a namespaced key. For example `{"google.bigquery": {"my":
"metadata"}}`
* Implementations...
 - MUST produce valid UTF-8 encoded text
 - SHOULD produce valid standard JSON
 - MAY produce valid non-standard JSON
 - MUST support parsing standard JSON
 - MAY support parsing non standard JSON
 - SHOULD pass through contents that they do not understand

## Forward compatibility
In the future we might allow this logical type to annotate a byte
storage type with a different text encoding.  Implementations
consuming JSON logical types should verify this.

 [1]:



https://urldefense.com/v3/__https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types*json_type__;Iw!!KSjYCgUGsB4!YhB-EpSLu8HTacaUsWvTVqF0kYh81UlVwNFBAc4-f95F7bGtdGuyWN_JObBkRSee-jTU20_MmGe2WUH8UMqTxPY$

 [2]:


https://urldefense.com/v3/__https://github.com/apache/parquet-format/blob/master/LogicalTypes.md*json__;Iw!!KSjYCgUGsB4!YhB-EpSLu8HTacaUsWvTVqF0kYh81UlVwNFBAc4-f95F7bGtdGuyWN_JObBkRSee-jTU20_MmGe2WUH8RFfD8NY$

 [3]:


https://urldefense.com/v3/__https://datatracker.ietf.org/doc/html/rfc8259__;!!KSjYCgUGsB4!YhB-EpSLu8HTacaUsWvTVqF0kYh81UlVwNFBAc4-f95F7bGtdGuyWN_JObBkRSee-jTU20_MmGe2WUH8MGoes7Q$





This message may contain information that is confidential or privileged.
If you are not the intended recipient, please advise the sender immediately
and delete this message. See
http://www.blackrock.com/corporate/compliance/email-disclaimers for
further information.  Please refer to
http://www.blackrock.com/corporate/compliance/privacy-policy for more
information about BlackRock’s Privacy Policy.


For a list of BlackRock's office addresses worldwide, see
http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2022 BlackRock, Inc. All rights reserved.