I'm working on fixing BEAM-9613 because encountered this issue as a result
of using BigQueryIO.readTableRowsWithSchema()

   1. BEAM-7526 provided support for Lists and Maps that came from the JSON
   export format
   2. BEAM-2879 switched the export format from JSON to Avro.  This caused
   issue BEAM-9613 since Avro no longer treated BQ BOOLEAN and FLOAT as a Java
   String but rather Java Boolean and Double.
   3. The switch from JSON to Avro also introduced an issue with fields
   with BQ REPEATED mode because fields of this mode.

I have a simple fix to handle BQ BOOLEAN and FLOAT (
https://github.com/mouyang/beam/commit/326b291ab333c719a9f54446c34611581ea696eb)
however I'm a bit uncomfortable with it because

   1. This would introduce mixing of both the JSON and Avro export formats,
   2. BEAM-8933 while still in progress would introduce Arrow and risk a
   regression,
   3. I haven't made a fix for REPEATED mode yet, but tests that use
   BigQueryUtilsTest.BQ_ARRAY_ROW would have to change (
   
https://github.com/apache/beam/blob/e039ca28d6f806f30b87cae82e6af86694c171cd/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtilsTest.java#L306-L311).
   I don't know if I should change this because I don't know if Beam wishes to
   continue support for the JSON format.

I'd like to incorporate these changes in my project as soon as possible,
but I also need guidance on next steps that would be in line with the
general direction of the project.  I'm looking forward to any feedback.
Thanks.

Reply via email to