snowflake): workaround snowflake metadata-only limitations [arrow-adbc]

via GitHub Mon, 29 Apr 2024 17:21:01 -0700


zeroshade commented on code in PR #1790:
URL: https://github.com/apache/arrow-adbc/pull/1790#discussion_r1583973978



##########
go/adbc/driver/snowflake/record_reader.go:
##########
@@ -480,20 +487,87 @@ func newRecordReader(ctx context.Context, alloc 
memory.Allocator, ld gosnowflake
                        }
                }
 
+               if ld.TotalRows() == 0 {
+                       return array.NewRecordReader(schema, []arrow.Record{})
+               }
+
                bldr := array.NewRecordBuilder(alloc, schema)
                defer bldr.Release()
 
-               rec, err := jsonDataToArrow(ctx, bldr, ld)
+               rec, err := jsonDataToArrow(ctx, bldr, rawData)
                if err != nil {
                        return nil, err
                }
                defer rec.Release()
 
-               if ld.TotalRows() != 0 {
-                       return array.NewRecordReader(schema, 
[]arrow.Record{rec})
-               } else {
-                       return array.NewRecordReader(schema, []arrow.Record{})
+               results := []arrow.Record{rec}
+               for _, b := range batches {
+                       rdr, err := b.GetStream(ctx)
+                       if err != nil {
+                               return nil, adbc.Error{
+                                       Msg:  err.Error(),
+                                       Code: adbc.StatusInternal,
+                               }
+                       }
+                       defer rdr.Close()
+
+                       // the "JSON" data returned isn't valid JSON. Instead 
it is a list of
+                       // comma-delimited JSON lists containing every value as 
a string, except
+                       // for a JSON null to represent nulls. Thus we can't 
just use the existing
+                       // JSON parsing code in Arrow.
+                       data, err := io.ReadAll(rdr)
+                       if err != nil {
+                               return nil, adbc.Error{
+                                       Msg:  err.Error(),
+                                       Code: adbc.StatusInternal,
+                               }
+                       }
+
+                       if cap(rawData) >= int(b.NumRows()) {
+                               rawData = rawData[:b.NumRows()]

Review Comment:
   We're re-using the same slice for each batch, but they aren't guaranteed to 
be the same number of rows. So if the capacity of the backing slice is larger 
than the number of rows, we'll only use enough elements equal to the number of 
rows. This doesn't only truncate if the number of rows is less than our 
capacity, it also *expands* the slice if the current length is less than the 
number of rows but the capacity is large enough.
   
   We're basically just ensuring that the top level `[][]*string` is *exactly* 
the same length as the number of rows.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(go/adbc/driver/snowflake): workaround snowflake metadata-only limitations [arrow-adbc]

Reply via email to