Hello,

I'm trying to build an Arrow Flight SQL server(which wraps DuckDB querying
parquet files) in Python. I've implemented the handler methods defined in
pyarrow FlightServerBase class
<https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightServerBase.html#pyarrow.flight.FlightServerBase>and
testing it with a Dbeaver client(loaded with JDBC driver for Arrow Flight
SQL <https://www.dremio.com/drivers/jdbc/>). However even though the client
connects successfully with the server, it is unable to read any of the data
sent back from the server. I'm suspecting it might be due to the
RecordBatch structure? After a lot of reading up the docs, I've tried
various ways of creating the RecordBatch with no luck.

For debugging simplicity I hand-wrote the following RecordBatch to be sent
for a DoGet RPC call(with CommandGetSqlInfo command) in the Ticket. Can
someone help point out any errors in this?

```
def do_get_sql_info(self, context: flight.ServerCallContext, cmd:
sqlPb.CommandGetSqlInfo) -> flight.FlightDataStream:
        sql_info_metadata = [
            {"info_name": "0", "value": "db_name"},
            {"info_name": "1", "value": "duckdb"},
        ]

        schema = pa.schema([
            pa.field("info_name", pa.uint32()),
            pa.field("value", pa.dense_union([
                pa.field("string_value", pa.string()),
                pa.field("bool_value", pa.bool_()),
                pa.field("bigint_value", pa.int64()),
                pa.field("int32_bitmask", pa.int32()),
                pa.field("string_list", pa.list_(pa.string())),
                pa.field("int32_to_int32_list_map", pa.map_(pa.int32(),
pa.list_(pa.int32())))
            ]))
        ])
        batch = pa.RecordBatch.from_pandas(pd.DataFrame(sql_info_metadata),
schema=schema)
        return flight.FlightDataStream(batch)
```

The client is unable to read the DB name as duckdb, instead it just prints
*??*

Note:

1. I'm using the C++ Flight SQL server
<https://github.com/apache/arrow/blob/15a8ac3ce4e3ac31f9f361770ad4a38c69102aa1/cpp/src/arrow/flight/sql/server.cc#L956>
as reference. They seem to be using Builders to build the SqlInfoResult but
I could not find its equivalent in Pyarrow.

2. I have checked Arrow Flight Python example server here
<https://github.com/apache/arrow/blob/aca1d3eeed3775c2f02e9f5d59d62478267950b1/python/examples/flight/server.py>
but it feels too simplistic and does not cover Flight SQL usecase.

3. Also tried to check what the client driver code expects here
<https://github.com/apache/arrow/blob/aca1d3eeed3775c2f02e9f5d59d62478267950b1/java/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/client/ArrowFlightSqlClientHandler.java#L98>
but its not too clear to me.

Appreciate some pointers on this.

Thanks
Nitesh

Reply via email to