lostluck commented on issue #28959:
URL: https://github.com/apache/beam/issues/28959#issuecomment-1767021739

   This does seem like a bug in the batch partitioned reads.
   
   The failing line SDK side:
   
   
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/spannerio/read_batch.go#L70
   
   Appears to be sending in a bad BatchReadOnlyTransactionID from the 
partitionedRead,
   
   ------------
   
   And the issue appears to be because those spanner types don't JSON 
serialize, they BinaryMarshals. So of course the type is bad on encode/decode. 
Beam doesn't automatically handle binary marshalling, and neither does the 
final fallback JSON encoding.
   
   https://pkg.go.dev/cloud.google.com/go/spanner#BatchReadOnlyTransactionID
   https://pkg.go.dev/cloud.google.com/go/spanner#Partition
   
   The fastest fix is to register encoder and decoder functions for the type.
   
   So what probably needs to be done is to register a coder for 
`spanner.partitionedRead`, since that field is not getting encoded properly.
   
   eg. Similar to the following should be added to the spannerio package, but 
with error handling added in, and pseudo code filled in.
   
   ```
   func init() {
   beam.RegisterCoder(reflect.TypeOf(partitionedRead{}), encodePartitionedRead, 
decodePartitionedRead)
   }
   
   func encodePartitionedRead(v partitionedRead) []byte {
     a, _:= v.BatchTransactionId.MarshalBinary()
     b, _ :=  v.Partition.MarshalBinary()
   // length prefix the first byte set, and return the bytesparameter
   }
   
   func decodePartitionedRead(bytes []byte) partitionedRead {
     var ret partitionedRead
     // Decode the length prefix n so we can split the fields.
     _ = ret.BatchTransactionId.UnarshalBinary(bytes[:n])
     _ = ret.Partition.UnarshalBinary(bytes[n:])
     return ret
   }
   
   ```
   
   See the doc for some light documentation 
https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam#RegisterCoder . 
It still claims json is the default coder, but that's not been true for a bit. 
The rest of it remains valid however.
   
   If one of you two @pequalsnp @azach were to try it and validate it for your 
usecase(s), that would be a big help versus my simple guess. IOs are hard 
because one needs to be an expert in both beam and the datasource them to write 
them well.  I'm very happy to review/merge any changes that work for y'all.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to