alvarowolfx commented on PR #421:
URL: https://github.com/apache/arrow-go/pull/421#issuecomment-3005497841

   > Why is the schema verification expensive? 
   
   It basically adds an extra database call/metadata check on the backend 
compare the BigQuery Table schema with Arrow Schema and see if there is any 
changes. 
   
   > Also, wouldn't you still need to run the same logic to check for changes 
to the table schema upon receiving the first RecordBatch? 
   
   Yes, but only on the first RecordBatch and/or when opening the stream to 
start writing data. Also when the schema do changes, there is a separated field 
where the schema can be informed.  Schema and serialized record batches goes on 
separated fields to the backend.
   * 
https://cloud.google.com/python/docs/reference/bigquerystorage/latest/google.cloud.bigquery_storage_v1.types.AppendRowsRequest.ArrowData.html
   
   > How does avoiding the Schema message allow you to avoid validating the 
schema?
   
   We only check for schema changes, when the specific schema field is filled 
out, works the same in Arrow and Protobuf.
   * 
https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#arrowschema
   * 
https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#protoschema
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to