zeroshade commented on issue #37976:
URL: https://github.com/apache/arrow/issues/37976#issuecomment-1743557463

   So a couple things first: 
   
   you can check `if fw, ok := dt.(arrow.FixedWidthDataType); ok { return 
fw.Bytes() }` which will get you the bytes per element for a fixed width data 
type without needing to do the full type switch like you're doing)
   
   Data types also have a `Layout()` method which returns a slice of 
`BufferSpec` objects, if the `Kind` is `SpecFixedWidth` then there is a `Bytes` 
member which will be the byte size. Again, allowing you to get this info 
without needing to do the type switch or explicitly verify per data type.
   
   You're also not including the size of the null bitmaps in your computation 
which may be possibly non-negligible.
   
   Now, the reason why you're getting that error, is because you are creating 
multiple writers across the same stream, you need only one writer to write the 
stream:
   
   Instead of this:
   
   ```go
   chunkSize := 4 * 1024 * 1024 // Bytes
   recordChunks := sliceRecordByBytes(transformedRecord, chunkSize)
   chunkSchema := recordChunks[0].Schema()
   currentChunk := make([]arrow.Array, 0)
   for _, rec := range recordChunks {
        for i := 0; i < int(rec.NumCols()); i++ {
                column := rec.Column(i)
                currentChunk = append(currentChunk, column)
        }
        // Create a Flight writer
        writeChunkToStream(server, chunkSchema, currentChunk)
        currentChunk = nil
   }
   ```
   
   You should do this:
   
   ```go
   rw := flight.NewRecordWriter(server, 
ipc.WithSchema(transformedRecord.Schema()))
   defer rw.Close() 
   
   chunkSize := 4 * 1024 * 1024 // Bytes
   recordChunks := sliceRecordByBytes(transformedRecord, chunkSize)
   defer func() {
           for _, chunk := range recordChunks {
                   chunk.Release()
           }
   }()
   
   for _, slice := range recordChunks {
           if err := rw.Write(slice); err != nil {
                   return err
           }
   }
   ```
   
   Every time you create a writer, the first thing it does is send a Schema 
message, you don't want multiple writers. you just want to write the slices to 
the writer separately. If you wanted, you could even combine these better and 
instead of creating *all* the slices and then sending them one by one, you 
could just find where you're going to slice, write that slice, call release on 
it, and then find the next slice.... rinse and repeat. That way you don't need 
a slice of records, and have fewer allocations.
   
   Just an idea


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to