Re: [PR] feat: add ipc.NewRecordBatchWriter [arrow-go]

via GitHub Wed, 25 Jun 2025 09:59:38 -0700


alvarowolfx commented on PR #421:
URL: https://github.com/apache/arrow-go/pull/421#issuecomment-3005497841

> Why is the schema verification expensive?

It basically adds an extra database call/metadata check on the backend
compare the BigQuery Table schema with Arrow Schema and see if there is any
changes.

> Also, wouldn't you still need to run the same logic to check for changes
to the table schema upon receiving the first RecordBatch?

Yes, but only on the first RecordBatch and/or when opening the stream to
start writing data. Also when the schema do changes, there is a separated field
where the schema can be informed. Schema and serialized record batches goes on
separated fields to the backend.
*
https://cloud.google.com/python/docs/reference/bigquerystorage/latest/google.cloud.bigquery_storage_v1.types.AppendRowsRequest.ArrowData.html

> How does avoiding the Schema message allow you to avoid validating the
schema?

We only check for schema changes, when the specific schema field is filled
out, works the same in Arrow and Protobuf.
*
https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#arrowschema
*
https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#protoschema

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] feat: add ipc.NewRecordBatchWriter [arrow-go]

Reply via email to