alvarowolfx commented on PR #421: URL: https://github.com/apache/arrow-go/pull/421#issuecomment-3005497841
> Why is the schema verification expensive? It basically adds an extra database call/metadata check on the backend compare the BigQuery Table schema with Arrow Schema and see if there is any changes. > Also, wouldn't you still need to run the same logic to check for changes to the table schema upon receiving the first RecordBatch? Yes, but only on the first RecordBatch and/or when opening the stream to start writing data. Also when the schema do changes, there is a separated field where the schema can be informed. Schema and serialized record batches goes on separated fields to the backend. * https://cloud.google.com/python/docs/reference/bigquerystorage/latest/google.cloud.bigquery_storage_v1.types.AppendRowsRequest.ArrowData.html > How does avoiding the Schema message allow you to avoid validating the schema? We only check for schema changes, when the specific schema field is filled out, works the same in Arrow and Protobuf. * https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#arrowschema * https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#protoschema -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org