KevinGG commented on pull request #17153:
URL: https://github.com/apache/beam/pull/17153#issuecomment-1076678937


   
   > Just to explore some other options other than converting schema to JSON, 
prior to #15610, there was no generator or TableSchema to cause pickling errors.
   > 
   > Instead of storing the generator as an instance attribute, a list of 
[TextSource](https://github.com/apache/beam/blob/v2.34.0/sdks/python/apache_beam/io/gcp/bigquery.py#L793-L798)
 and notably its `coder` attribute were stored instead (we can assume 
`use_json_exports=True` for this discussion). The default `coder`, 
`_JsonToDictCoder`, had a method 
[`_convert_to_tuple`](https://github.com/apache/beam/blob/v2.34.0/sdks/python/apache_beam/io/gcp/bigquery_read_internal.py#L401-L413)
 to marshal the TableSchema into an object more amenable to pickling.
   > 
   > Perhaps I can use the same `_convert_to_tuple` method to create a 
picklable version of TableSchema and store that as an attribute rather than 
going the JSON route?
   
   How about we store the coders directly in the `_BigQueryExportResult` class? 
Since we don't really need the TableSchema later but a coder built from the 
TableSchema.
   I already tested it out that the coder itself is pickle-able.
   
   `_JsonToDictCoder` is the default `self.coder` used to create the coder from 
a TableSchma, but not necessarily the only possible implementation.
   
   So your `_BigQueryExportResult` class could be:
   
   ```python
   @dataclass
   class _BigQueryExportResult:
      coder: beam.coders.Coder
      paths: List[str]
   ```
   
   And
   
   ```python
   export_result = __BigQueryExportResult(coder=self.coder(table_schema), 
paths=[metadata.path for metadata in metadata_list])
   ```
      
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to