Re: [ANNOUNCE] Apache Arrow Flight SQL adapter for PostgreSQL 0.1.0 released

2023-09-13 Thread Junwang Zhao
On Thu, Sep 14, 2023 at 2:43 PM Sutou Kouhei  wrote:
>
> Hi Junwang,
>
> Thanks for trying this product!
>
> Sorry, the example program has problems. Could you try this
> patch?
>
> 
> diff --git a/example/flight-sql/query-prepared.cc 
> b/example/flight-sql/query-prepared.cc
> index 621b650..a3e1a87 100644
> --- a/example/flight-sql/query-prepared.cc
> +++ b/example/flight-sql/query-prepared.cc
> @@ -128,7 +128,7 @@ run()
> ARROW_RETURN_NOT_OK(i_builder->Append(10));
> ARROW_ASSIGN_OR_RAISE(auto record_batch, 
> record_batch_builder->Flush());
> ARROW_RETURN_NOT_OK(statement->SetParameters(record_batch));
> -   ARROW_ASSIGN_OR_RAISE(auto info, statement->Execute());
> +   ARROW_ASSIGN_OR_RAISE(auto info, statement->Execute(call_options));
> for (const auto& endpoint : info->endpoints())
> {
> ARROW_ASSIGN_OR_RAISE(auto reader,
> @@ -143,7 +143,7 @@ run()
> std::cout << chunk.data->ToString() << std::endl;
> }
> }
> -   ARROW_RETURN_NOT_OK(statement->Close());
> +   ARROW_RETURN_NOT_OK(statement->Close(call_options));

It works! thanks ;)

> return sql_client->Close();
>  }
>  // End query

> 
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [ANNOUNCE] Apache Arrow Flight SQL adapter for PostgreSQL 0.1.0 
> released" on Thu, 14 Sep 2023 14:31:32 +0800,
>   Junwang Zhao  wrote:
>
> > Hi Sutou,
> >
> > On Thu, Sep 14, 2023 at 8:06 AM Sutou Kouhei  wrote:
> >>
> >> The Apache Arrow team is pleased to announce the 0.1.0 release of
> >> the Apache Arrow Flight SQL adapter for PostgreSQL.
> >>
> >> The release is available now from our website:
> >>   https://arrow.apache.org/flight-sql-postgresql/0.1.0/install.html
> >>
> >> Read about what's new in the release:
> >>   
> >> https://arrow.apache.org/blog/2023/09/13/flight-sql-postgresql-0.1.0-release/
> >>
> >> Release note:
> >>   
> >> https://arrow.apache.org/flight-sql-postgresql/0.1.0/release-notes.html#version-0-1-0
> >>
> >>
> >> What is Apache Arrow Flight SQL adapter for PostgreSQL?
> >>
> >> Apache Arrow Flight SQL adapter for PostgreSQL is a
> >> PostgreSQL extension that adds an Apache Arrow Flight SQL
> >> endpoint to PostgreSQL.
> >>
> >> Apache Arrow Flight SQL is a protocol to use Apache Arrow
> >> format to interact with SQL databases. You can use Apache
> >> Arrow Flight SQL instead of the PostgreSQL wire protocol to
> >> interact with PostgreSQL by Apache Arrow Flight SQL adapter
> >> for PostgreSQL.
> >
> > I tried the examples provide in the repo, authenticate-password and
> > query-ad-hoc gives the right output, but query-prepared seems
> > not working well with following error message:
> >
> > /build/apache-arrow-13.0.0/cpp/src/arrow/flight/sql/client.cc:154:
> > Failed to delete PreparedStatement: IOError: No authorization header.
> > Detail: Unauthenticated. Detail: Unauthenticated
> > IOError: No authorization header. Detail: Unauthenticated. gRPC client
> > debug context: {"created":"@1694672534.441199175","description":"Error
> > received from peer
> > ipv4:127.0.0.1:15432","file":"/build/apache-arrow-13.0.0/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/lib/surface/call.cc","file_line":952,"grpc_message":"No
> > authorization header. Detail: Unauthenticated","grpc_status":16}.
> > Client context: OK. Detail: Unauthenticated
> >
> > This error came from this line:
> >
> > *ARROW_RETURN_NOT_OK(statement->SetParameters(record_batch));*
> >
> > That is to say the authenticate logic in connect is ok, do we need
> > some authorization header to address this error?
> >
> >>
> >> Apache Arrow format is designed for fast typed table data
> >> exchange. If you want to get large data by SELECT or
> >> INSERT/UPDATE large data, Apache Arrow Flight SQL will be
> >> faster than the PostgreSQL wire protocol.
> >>
> >>
> >> Please report any feedback to the GitHub issues or mailing lists:
> >>   * GitHub: https://github.com/apache/arrow-flight-sql-postgresql/issues
> >>   * ML: https://arrow.apache.org/community/
> >>
> >>
> >> Thanks,
> >> --
> >> The Apache Arrow community
> >
> >
> >
> > --
> > Regards
> > Junwang Zhao



-- 
Regards
Junwang Zhao


Re: [ANNOUNCE] Apache Arrow Flight SQL adapter for PostgreSQL 0.1.0 released

2023-09-13 Thread Sutou Kouhei
Hi Junwang,

Thanks for trying this product!

Sorry, the example program has problems. Could you try this
patch?


diff --git a/example/flight-sql/query-prepared.cc 
b/example/flight-sql/query-prepared.cc
index 621b650..a3e1a87 100644
--- a/example/flight-sql/query-prepared.cc
+++ b/example/flight-sql/query-prepared.cc
@@ -128,7 +128,7 @@ run()
ARROW_RETURN_NOT_OK(i_builder->Append(10));
ARROW_ASSIGN_OR_RAISE(auto record_batch, record_batch_builder->Flush());
ARROW_RETURN_NOT_OK(statement->SetParameters(record_batch));
-   ARROW_ASSIGN_OR_RAISE(auto info, statement->Execute());
+   ARROW_ASSIGN_OR_RAISE(auto info, statement->Execute(call_options));
for (const auto& endpoint : info->endpoints())
{
ARROW_ASSIGN_OR_RAISE(auto reader,
@@ -143,7 +143,7 @@ run()
std::cout << chunk.data->ToString() << std::endl;
}
}
-   ARROW_RETURN_NOT_OK(statement->Close());
+   ARROW_RETURN_NOT_OK(statement->Close(call_options));
return sql_client->Close();
 }
 // End query


Thanks,
-- 
kou

In 
  "Re: [ANNOUNCE] Apache Arrow Flight SQL adapter for PostgreSQL 0.1.0 
released" on Thu, 14 Sep 2023 14:31:32 +0800,
  Junwang Zhao  wrote:

> Hi Sutou,
> 
> On Thu, Sep 14, 2023 at 8:06 AM Sutou Kouhei  wrote:
>>
>> The Apache Arrow team is pleased to announce the 0.1.0 release of
>> the Apache Arrow Flight SQL adapter for PostgreSQL.
>>
>> The release is available now from our website:
>>   https://arrow.apache.org/flight-sql-postgresql/0.1.0/install.html
>>
>> Read about what's new in the release:
>>   
>> https://arrow.apache.org/blog/2023/09/13/flight-sql-postgresql-0.1.0-release/
>>
>> Release note:
>>   
>> https://arrow.apache.org/flight-sql-postgresql/0.1.0/release-notes.html#version-0-1-0
>>
>>
>> What is Apache Arrow Flight SQL adapter for PostgreSQL?
>>
>> Apache Arrow Flight SQL adapter for PostgreSQL is a
>> PostgreSQL extension that adds an Apache Arrow Flight SQL
>> endpoint to PostgreSQL.
>>
>> Apache Arrow Flight SQL is a protocol to use Apache Arrow
>> format to interact with SQL databases. You can use Apache
>> Arrow Flight SQL instead of the PostgreSQL wire protocol to
>> interact with PostgreSQL by Apache Arrow Flight SQL adapter
>> for PostgreSQL.
> 
> I tried the examples provide in the repo, authenticate-password and
> query-ad-hoc gives the right output, but query-prepared seems
> not working well with following error message:
> 
> /build/apache-arrow-13.0.0/cpp/src/arrow/flight/sql/client.cc:154:
> Failed to delete PreparedStatement: IOError: No authorization header.
> Detail: Unauthenticated. Detail: Unauthenticated
> IOError: No authorization header. Detail: Unauthenticated. gRPC client
> debug context: {"created":"@1694672534.441199175","description":"Error
> received from peer
> ipv4:127.0.0.1:15432","file":"/build/apache-arrow-13.0.0/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/lib/surface/call.cc","file_line":952,"grpc_message":"No
> authorization header. Detail: Unauthenticated","grpc_status":16}.
> Client context: OK. Detail: Unauthenticated
> 
> This error came from this line:
> 
> *ARROW_RETURN_NOT_OK(statement->SetParameters(record_batch));*
> 
> That is to say the authenticate logic in connect is ok, do we need
> some authorization header to address this error?
> 
>>
>> Apache Arrow format is designed for fast typed table data
>> exchange. If you want to get large data by SELECT or
>> INSERT/UPDATE large data, Apache Arrow Flight SQL will be
>> faster than the PostgreSQL wire protocol.
>>
>>
>> Please report any feedback to the GitHub issues or mailing lists:
>>   * GitHub: https://github.com/apache/arrow-flight-sql-postgresql/issues
>>   * ML: https://arrow.apache.org/community/
>>
>>
>> Thanks,
>> --
>> The Apache Arrow community
> 
> 
> 
> -- 
> Regards
> Junwang Zhao


Re: [ANNOUNCE] Apache Arrow Flight SQL adapter for PostgreSQL 0.1.0 released

2023-09-13 Thread Junwang Zhao
Hi Sutou,

On Thu, Sep 14, 2023 at 8:06 AM Sutou Kouhei  wrote:
>
> The Apache Arrow team is pleased to announce the 0.1.0 release of
> the Apache Arrow Flight SQL adapter for PostgreSQL.
>
> The release is available now from our website:
>   https://arrow.apache.org/flight-sql-postgresql/0.1.0/install.html
>
> Read about what's new in the release:
>   
> https://arrow.apache.org/blog/2023/09/13/flight-sql-postgresql-0.1.0-release/
>
> Release note:
>   
> https://arrow.apache.org/flight-sql-postgresql/0.1.0/release-notes.html#version-0-1-0
>
>
> What is Apache Arrow Flight SQL adapter for PostgreSQL?
>
> Apache Arrow Flight SQL adapter for PostgreSQL is a
> PostgreSQL extension that adds an Apache Arrow Flight SQL
> endpoint to PostgreSQL.
>
> Apache Arrow Flight SQL is a protocol to use Apache Arrow
> format to interact with SQL databases. You can use Apache
> Arrow Flight SQL instead of the PostgreSQL wire protocol to
> interact with PostgreSQL by Apache Arrow Flight SQL adapter
> for PostgreSQL.

I tried the examples provide in the repo, authenticate-password and
query-ad-hoc gives the right output, but query-prepared seems
not working well with following error message:

/build/apache-arrow-13.0.0/cpp/src/arrow/flight/sql/client.cc:154:
Failed to delete PreparedStatement: IOError: No authorization header.
Detail: Unauthenticated. Detail: Unauthenticated
IOError: No authorization header. Detail: Unauthenticated. gRPC client
debug context: {"created":"@1694672534.441199175","description":"Error
received from peer
ipv4:127.0.0.1:15432","file":"/build/apache-arrow-13.0.0/cpp_build/grpc_ep-prefix/src/grpc_ep/src/core/lib/surface/call.cc","file_line":952,"grpc_message":"No
authorization header. Detail: Unauthenticated","grpc_status":16}.
Client context: OK. Detail: Unauthenticated

This error came from this line:

*ARROW_RETURN_NOT_OK(statement->SetParameters(record_batch));*

That is to say the authenticate logic in connect is ok, do we need
some authorization header to address this error?

>
> Apache Arrow format is designed for fast typed table data
> exchange. If you want to get large data by SELECT or
> INSERT/UPDATE large data, Apache Arrow Flight SQL will be
> faster than the PostgreSQL wire protocol.
>
>
> Please report any feedback to the GitHub issues or mailing lists:
>   * GitHub: https://github.com/apache/arrow-flight-sql-postgresql/issues
>   * ML: https://arrow.apache.org/community/
>
>
> Thanks,
> --
> The Apache Arrow community



-- 
Regards
Junwang Zhao


[ANNOUNCE] Apache Arrow Flight SQL adapter for PostgreSQL 0.1.0 released

2023-09-13 Thread Sutou Kouhei
The Apache Arrow team is pleased to announce the 0.1.0 release of
the Apache Arrow Flight SQL adapter for PostgreSQL.

The release is available now from our website:
  https://arrow.apache.org/flight-sql-postgresql/0.1.0/install.html

Read about what's new in the release:
  https://arrow.apache.org/blog/2023/09/13/flight-sql-postgresql-0.1.0-release/

Release note:
  
https://arrow.apache.org/flight-sql-postgresql/0.1.0/release-notes.html#version-0-1-0


What is Apache Arrow Flight SQL adapter for PostgreSQL?

Apache Arrow Flight SQL adapter for PostgreSQL is a
PostgreSQL extension that adds an Apache Arrow Flight SQL
endpoint to PostgreSQL.

Apache Arrow Flight SQL is a protocol to use Apache Arrow
format to interact with SQL databases. You can use Apache
Arrow Flight SQL instead of the PostgreSQL wire protocol to
interact with PostgreSQL by Apache Arrow Flight SQL adapter
for PostgreSQL.

Apache Arrow format is designed for fast typed table data
exchange. If you want to get large data by SELECT or
INSERT/UPDATE large data, Apache Arrow Flight SQL will be
faster than the PostgreSQL wire protocol.


Please report any feedback to the GitHub issues or mailing lists:
  * GitHub: https://github.com/apache/arrow-flight-sql-postgresql/issues
  * ML: https://arrow.apache.org/community/


Thanks,
-- 
The Apache Arrow community


Re: [DISCUSS] Proposal to add VariableShapeTensor Canonical Extension Type

2023-09-13 Thread Jeremy Leibs
Additionally, after reviewing, I also think the introduction of
permutations requires a bit more clarification.

Please consider adding some wording and an example such as:

With the exception of the permutation parameter, all other lists and
storage within the Tensor and the extension parameters define
the *physical* storage of the tensor.

For example, consider a Tensor with:
  shape = [10, 20, 30]
  dim_names = [x, y, z]
  permutations = [2, 0, 1]

This means the logical tensor has names [z, x, y] and shape [30, 10, 20].

Other than that, looks great! Thanks for working on this.
-Jeremy

On Wed, Sep 13, 2023 at 2:38 AM Rok Mihevc  wrote:

> After some discussion on the PR [
> https://github.com/apache/arrow/pull/37166]
> we've altered the proposed type by removing the ndim parameter and
> adding ragged_dimensions one.
> If there is no further feedback I'd like to call for a vote early next
> week. Proposed language now reads:
>
> Variable shape tensor
> =
>
> * Extension name: `arrow.variable_shape_tensor`.
>
> * The storage type of the extension is: ``StructArray`` where struct
>   is composed of **data** and **shape** fields describing a single
>   tensor per row:
>
>   * **data** is a ``List`` holding tensor elements of a single tensor.
> Data type of the list elements is uniform across the entire column
> and also provided in metadata.
>   * **shape** is a ``FixedSizeList[ndim]`` of the tensor shape
> where
> the size of the list ``ndim`` is equal to the number of dimensions of
> the
> tensor.
>
> * Extension type parameters:
>
>   * **value_type** = the Arrow data type of individual tensor elements.
>
>   Optional parameters describing the logical layout:
>
>   * **dim_names** = explicit names to tensor dimensions
> as an array. The length of it should be equal to the shape
> length and equal to the number of dimensions.
>
> ``dim_names`` can be used if the dimensions have well-known
> names and they map to the physical layout (row-major).
>
>   * **permutation**  = indices of the desired ordering of the
> original dimensions, defined as an array.
>
> The indices contain a permutation of the values [0, 1, .., N-1] where
> N is the number of dimensions. The permutation indicates which
> dimension of the logical layout corresponds to which dimension of the
> physical tensor (the i-th dimension of the logical view corresponds
> to the dimension with number ``permutations[i]`` of the physical
> tensor).
>
> Permutation can be useful in case the logical order of
> the tensor is a permutation of the physical order (row-major).
>
> When logical and physical layout are equal, the permutation will always
> be ([0, 1, .., N-1]) and can therefore be left out.
>
>   * **ragged_dimensions** = indices of ragged dimensions whose sizes may
> differ. Dimensions where all elements have the same size are called
> uniform dimensions. Indices are a subset of all possible dimension
> indices ([0, 1, .., N-1]).
> Ragged dimensions list can be left out. In that case all dimensions
> are assumed ragged.
>
> * Description of the serialization:
>
>   The metadata must be a valid JSON object including number of
>   dimensions of the contained tensors as an integer with key **"ndim"**
>   plus optional dimension names with keys **"dim_names"** and ordering of
>   the dimensions with key **"permutation"**.
>
>   - Example with ``dim_names`` metadata for NCHW ordered data:
>
> ``{ "dim_names": ["C", "H", "W"] }``
>
>   - Example with ``ragged_dimensions`` metadata for a set of color images
> with variable width:
>
> ``{ "dim_names": ["H", "W", "C"], "ragged_dimensions": [1] }``
>
>   - Example of permuted 3-dimensional tensor:
>
> ``{ "permutation": [2, 0, 1] }``
>
> This is the physical layout shape and the shape of the logical
> layout would given an individual tensor of shape [100, 200, 500]
> be ``[500, 100, 200]``.
>
> .. note::
>
>   Elements in a variable shape tensor extension array are stored
>   in row-major/C-contiguous order.
>
>
> Rok
>


Re: [DISCUSS] Proposal to add VariableShapeTensor Canonical Extension Type

2023-09-13 Thread Jeremy Leibs
On Wed, Sep 13, 2023 at 8:38 AM Antoine Pitrou  wrote:

>
> Le 13/09/2023 à 02:37, Rok Mihevc a écrit :
> >
> >* **ragged_dimensions** = indices of ragged dimensions whose sizes may
> >  differ. Dimensions where all elements have the same size are called
> >  uniform dimensions. Indices are a subset of all possible dimension
> >  indices ([0, 1, .., N-1]).
> >  Ragged dimensions list can be left out. In that case all dimensions
> >  are assumed ragged.
>
> It's a bit confusing that an empty list means "no ragged dimensions" but
> a missing entry means "all dimensions are ragged". This seems
> error-prone to me.
>
> Also, to be clear, "ragged_dimensions" is only useful for data validation?
>
>
I am also quite confused by how to interpret / use ragged dimensions. Given
that this is a "variable" shaped tensor, I personally find specifying the
exceptional case -- the "uniform" dimensions -- to be much more clear.

I would propose instead:

**uniform_dimenions** = Indices of dimensions whose sizes are guaranteed to
remain constant.
Indices are a subset of all possible dimension indices ([0, 1, .., N-1]).
The uniform dimensions must
still be represented in the `shape` field, and must always be the same
value for all tensors in the
array -- this allows code to interpret the tensor correctly without
accounting for uniform dimensions
while still permitting optional optimizations that take advantage of the
uniformity. Uniform_dimensions
can be left out, in which case it is assumed that all dimensions might be
variable.


Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 31.0.0 RC1

2023-09-13 Thread L. C. Hsieh
+1 (binding)

Verified on Intel Mac.

Thanks Andy.

On Wed, Sep 13, 2023 at 12:26 PM Andy Grove  wrote:
>
> Hi,
>
> I would like to propose a release of Apache Arrow DataFusion Python
> Bindings,
> version 31.0.0.
>
> This release candidate is based on commit:
> 54d17771fc2814339f94e0871401ee946d7c913b [1]
> The proposed release tarball and signatures are hosted at [2].
> The changelog is located at [3].
> The Python wheels are located at [4].
>
> Please download, verify checksums and signatures, run the unit tests, and
> vote
> on the release. The vote will be open for at least 72 hours.
>
> Only votes from PMC members are binding, but all members of the community
> are
> encouraged to test the release and vote with "(non-binding)".
>
> The standard verification procedure is documented at
> https://github.com/apache/arrow-datafusion-python/blob/main/dev/release/README.md#verifying-release-candidates
> .
>
> [ ] +1 Release this as Apache Arrow DataFusion Python 31.0.0
> [ ] +0
> [ ] -1 Do not release this as Apache Arrow DataFusion Python 31.0.0
> because...
>
> Here is my vote:
>
> +1
>
> [1]:
> https://github.com/apache/arrow-datafusion-python/tree/54d17771fc2814339f94e0871401ee946d7c913b
> [2]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-python-31.0.0-rc1
> [3]:
> https://github.com/apache/arrow-datafusion-python/blob/54d17771fc2814339f94e0871401ee946d7c913b/CHANGELOG.md
> [4]: https://test.pypi.org/project/datafusion/31.0.0/


[VOTE][RUST][DataFusion] Release DataFusion Python Bindings 31.0.0 RC1

2023-09-13 Thread Andy Grove
Hi,

I would like to propose a release of Apache Arrow DataFusion Python
Bindings,
version 31.0.0.

This release candidate is based on commit:
54d17771fc2814339f94e0871401ee946d7c913b [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].
The Python wheels are located at [4].

Please download, verify checksums and signatures, run the unit tests, and
vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community
are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at
https://github.com/apache/arrow-datafusion-python/blob/main/dev/release/README.md#verifying-release-candidates
.

[ ] +1 Release this as Apache Arrow DataFusion Python 31.0.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow DataFusion Python 31.0.0
because...

Here is my vote:

+1

[1]:
https://github.com/apache/arrow-datafusion-python/tree/54d17771fc2814339f94e0871401ee946d7c913b
[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-python-31.0.0-rc1
[3]:
https://github.com/apache/arrow-datafusion-python/blob/54d17771fc2814339f94e0871401ee946d7c913b/CHANGELOG.md
[4]: https://test.pypi.org/project/datafusion/31.0.0/