alvarowolfx commented on PR #421: URL: https://github.com/apache/arrow-go/pull/421#issuecomment-3005706235
> > To add another point, Pyarrow allows for reading the schema and recordbatches separately in IPC format: > > https://cloud.google.com/bigquery/docs/write-api-streaming#arrow-format > > You can already do the equivalent to that Python code in Go, though I guess the issue you run into is the lack of the padding handling. If we simply add a new method to the Payload struct, we can achieve the exact same logic. This PR could instead just be the following: > > ```go > // a drawback to this is having to use bytes.Buffer to get the raw bytes > // if you aren't already using an io.Writer. > func (p *Payload) WritePayload(w io.Writer) (int, error) { > return writeIPCPayload(w, *p) > } > > // alternatively if we just want to get the raw bytes, we can do > func (p *Payload) SerializedBytes() ([]byte, error) { > var b bytes.Buffer > _, err := writeIPCPayload(&b, *p) > if err != nil { > return err > } > return b.Bytes(), nil > } > ``` > > Then you can create the equivalent Go to the pyarrow example you provided, without needing to have an entire new writer. > > ```go > func appendRows(tbl arrow.Table, projectID, datasetID, tableID string) error { > // create request etc.... > > schemaPayload := ipc.GetSchemaPayload(tbl.Schema(), memory.DefaultAllocator) > serializedSchemaBytes, err := schemaPayload.SerializedBytes() > if err != nil { > return err > } > // do whatever you want with the byte slice for the schema > > rdr := array.NewTableReader(tbl, tbl.NumRows()) > defer rdr.Release() > > // the pyarrow example only uses the first record batch, you probably would instead use > // for rdr.Next() to loop over all the batches.... but i'll mirror the pyarrow example for now > rdr.Next() > payload, err := ipc.GetRecordBatchPayload(rdr.Record()) > if err != nil { > return err > } > > serializedRecordBytes, err := payload.SerializedBytes() > if err != nil { > return err > } > // do whatever you like with the serializedRecordBytes > > // .... > } > ``` this approach works for me too, I figure out that exposing `writeIPCPayload` was going to work too, but wasn't sure what was the best approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org