snowflake): Keep track of all files copied and skip empty files in bulk_ingestion [arrow-adbc]

via GitHub Thu, 29 Aug 2024 12:29:13 -0700


joellubi commented on code in PR #2106:
URL: https://github.com/apache/arrow-adbc/pull/2106#discussion_r1737006379



##########
go/adbc/driver/snowflake/bulk_ingestion.go:
##########
@@ -334,21 +347,32 @@ func writeParquet(
        parquetProps *parquet.WriterProperties,
        arrowProps pqarrow.ArrowWriterProperties,
 ) error {
+
+       // don't start a new parquet file unless there is at least one record 
available
+       rec, ok := <-in
+       if !ok {
+               return ErrNoRecordsInStream
+       }
+
+       // initialize writer
        pqWriter, err := pqarrow.NewFileWriter(schema, w, parquetProps, 
arrowProps)
        if err != nil {
                return err
        }
        defer pqWriter.Close()
 
-       var bytesWritten int64
-       for rec := range in {
-               if rec.NumRows() == 0 {
-                       rec.Release()
-                       continue
-               }
+       // write first record
+       bytesWritten, err := writeRecordToParquet(pqWriter, rec)
+       if err != nil {
+               return err
+       }
+       if targetSize > 0 && bytesWritten >= int64(targetSize) {
+               return nil
+       }

Review Comment:
   I simplified this further. The parquet writer is always initialized, we just 
check the bytesWritten to see if we should discard the buffer or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(go/adbc/driver/snowflake): Keep track of all files copied and skip empty files in bulk_ingestion [arrow-adbc]

Reply via email to