Reto Egeter created BEAM-12773:
----------------------------------

             Summary: 404 Session not found, when querying Google Cloud Spanner 
with Python Dataflow.
                 Key: BEAM-12773
                 URL: https://issues.apache.org/jira/browse/BEAM-12773
             Project: Beam
          Issue Type: Bug
          Components: io-py-gcp
    Affects Versions: 2.29.0, 2.33.0
            Reporter: Reto Egeter


My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial 
run is successful, but any subsequent run fails with this error. 
"h1.google.api_core.exceptions.NotFound: 404 Session not found"
and also "504 Deadline Exceeded"

Here is part of the code:

{code:python}

SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2'

spanner_domains = (
      p
      | 'ReadFromSpanner' >> ReadFromSpanner(
          project_id, database, database, sql=SPANNER_QUERY)
      | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner))

def _KeyDomainSpanner(entity):
  row = {}
  for i, column in enumerate(['row_id', 'update_key']):
    row[column] = entity[i]
  return row['row_id'], row

{code}

The Dataflow job is able to read around 10M rows with 2.29.0 but only a few 
thousand with 2.33.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to