[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389169 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 23:20 Start Date: 18/Feb/20 23:20 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389169) Time Spent: 22.5h (was: 22h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 22.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389124 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 21:57 Start Date: 18/Feb/20 21:57 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-587923073 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389124) Time Spent: 22h 20m (was: 22h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 22h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389122 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 21:53 Start Date: 18/Feb/20 21:53 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-587916297 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389122) Time Spent: 22h 10m (was: 22h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 22h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389121 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 21:52 Start Date: 18/Feb/20 21:52 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-587914900 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389121) Time Spent: 22h (was: 21h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 22h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389016 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 19:18 Start Date: 18/Feb/20 19:18 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-587691085 Thanks @nielm .. @chamikaramj is there anything you would like to add or we are ready for merge? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389016) Time Spent: 21h 50m (was: 21h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 21h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=387971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387971 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 15/Feb/20 18:37 Start Date: 15/Feb/20 18:37 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-586629057 @chamikaramj @nielm could you please verify the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387971) Time Spent: 21h 40m (was: 21.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 21h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386741 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 13/Feb/20 16:49 Start Date: 13/Feb/20 16:49 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585856253 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386741) Time Spent: 21.5h (was: 21h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 21.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386738 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 13/Feb/20 16:44 Start Date: 13/Feb/20 16:44 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585853706 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386738) Time Spent: 21h 20m (was: 21h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 21h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386419&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386419 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 13/Feb/20 07:23 Start Date: 13/Feb/20 07:23 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585588148 @aaltay I've rebased my branch, could you please trigger the tests! - Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386419) Time Spent: 21h 10m (was: 21h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 21h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386233 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 12/Feb/20 21:04 Start Date: 12/Feb/20 21:04 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585416739 I believe this is fixed with https://github.com/apache/beam/pull/10844, you may need to rebase. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386233) Time Spent: 21h (was: 20h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 21h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386155 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 12/Feb/20 19:18 Start Date: 12/Feb/20 19:18 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585371923 @aaltay All three tests are failed due to `ImportError: No module named 'pycodestyle'` in `avro-python3` package. Portable_Python PreCommit https://scans.gradle.com/s/3qf5sqnettmmq/console-log?task=:sdks:python:test-suites:portable:py35:installGcpTest#L299 PreCommit https://scans.gradle.com/s/c5ncivj7k2pko/console-log?task=:sdks:python:test-suites:dataflow:py37:installGcpTest#L658 PythonFormatter PreCommit https://scans.gradle.com/s/i6nvgyym5tfqk/console-log?task=:sdks:python:test-suites:tox:py37:formatter#L267 Could you please rerun these test, possibly it'll fix the issue! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386155) Time Spent: 20h 50m (was: 20h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 20h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386105 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 12/Feb/20 18:02 Start Date: 12/Feb/20 18:02 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585337286 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386105) Time Spent: 20h 40m (was: 20.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 20h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385953 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 12/Feb/20 14:47 Start Date: 12/Feb/20 14:47 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585239918 @aaltay @markflyhigh Seems like jobs are triggered and completed successfully but showing no activity on github! https://builds.apache.org/job/beam_PreCommit_Python_Commit/11080/ https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/67/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385953) Time Spent: 20.5h (was: 20h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 20.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385333 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 11/Feb/20 18:16 Start Date: 11/Feb/20 18:16 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584776785 /cc @markflyhigh - Tests seems to be not triggering? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385333) Time Spent: 20h 20m (was: 20h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 20h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385325 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 11/Feb/20 18:11 Start Date: 11/Feb/20 18:11 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584774835 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385325) Time Spent: 20h 10m (was: 20h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 20h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385066 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 11/Feb/20 10:24 Start Date: 11/Feb/20 10:24 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584566071 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385066) Time Spent: 19h 50m (was: 19h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 19h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385067 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 11/Feb/20 10:24 Start Date: 11/Feb/20 10:24 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584566111 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385067) Time Spent: 20h (was: 19h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 20h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385064 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 11/Feb/20 10:24 Start Date: 11/Feb/20 10:24 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584566111 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385064) Time Spent: 19.5h (was: 19h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 19.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385065 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 11/Feb/20 10:24 Start Date: 11/Feb/20 10:24 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584566278 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385065) Time Spent: 19h 40m (was: 19.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 19h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385063 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 11/Feb/20 10:23 Start Date: 11/Feb/20 10:23 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584566071 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385063) Time Spent: 19h 20m (was: 19h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 19h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=384649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384649 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 10/Feb/20 18:47 Start Date: 10/Feb/20 18:47 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584290424 Trigger tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384649) Time Spent: 19h 10m (was: 19h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 19h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=384640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384640 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 10/Feb/20 18:40 Start Date: 10/Feb/20 18:40 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584287258 ping for test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384640) Time Spent: 19h (was: 18h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 19h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=384637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384637 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 10/Feb/20 18:32 Start Date: 10/Feb/20 18:32 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584283428 > Could you also add a new feature note in https://github.com/apache/beam/blob/master/CHANGES.md Done. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384637) Time Spent: 18h 50m (was: 18h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 18h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=384623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384623 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 10/Feb/20 18:14 Start Date: 10/Feb/20 18:14 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584258200 Could you also add a new feature note in https://github.com/apache/beam/blob/master/CHANGES.md This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384623) Time Spent: 18h 40m (was: 18.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 18h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=384621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384621 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 10/Feb/20 18:14 Start Date: 10/Feb/20 18:14 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584257913 Trigger tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384621) Time Spent: 18.5h (was: 18h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 18.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=384620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384620 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 10/Feb/20 18:13 Start Date: 10/Feb/20 18:13 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-584257624 ping for test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384620) Time Spent: 18h 20m (was: 18h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 18h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382976 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 16:36 Start Date: 06/Feb/20 16:36 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375945920 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns: T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382975 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 16:35 Start Date: 06/Feb/20 16:35 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375945920 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns: T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382974 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 16:35 Start Date: 06/Feb/20 16:35 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375945920 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns: T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382973 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 16:35 Start Date: 06/Feb/20 16:35 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375945639 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -109,20 +111,74 @@ ReadFromSpanner takes this transform in the constructor and pass this to the read pipeline as the singleton side input. + +Writing Data to Cloud Spanner. + +The WriteToSpanner transform writes to Cloud Spanner by executing a +collection a input rows (WriteMutation). The mutations are grouped into +batches for efficiency. + +WriteToSpanner transform relies on the WriteMutation objects which is exposed +by the SpannerIO API. WriteMutation have five static methods (insert, update, +insert_or_update, replace, delete). These methods returns the instance of the +_Mutator object which contains the mutation type and the Spanner Mutation +object. For more details, review the docs of the class SpannerIO.WriteMutation. +For example::: + + mutations = [ +WriteMutation.insert(table='user', columns=('name', 'email'), +values=[('sara'. 's...@dev.com')]) + ] + _ = (p + | beam.Create(mutations) + | WriteToSpanner( + project_id=SPANNER_PROJECT_ID, + instance_id=SPANNER_INSTANCE_ID, + database_id=SPANNER_DATABASE_NAME) +) + +You can also create WriteMutation via calling its constructor. For example::: + + mutations = [ + WriteMutation(insert='users', columns=('name', 'email'), +values=[('sara", 's...@example.com')]) + ] + +For more information, review the docs available on WriteMutation class. + +WriteToSpanner transform also takes 'max_batch_size_bytes' param which is set +to 1MB (1048576 bytes) by default. This parameter used to reduce the number of Review comment: Thanks. I'll update the code! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 382973) Time Spent: 17h 40m (was: 17.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 17h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382962 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 16:10 Start Date: 06/Feb/20 16:10 Worklog Time Spent: 10m Work Description: nielm commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375930092 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -109,20 +111,74 @@ ReadFromSpanner takes this transform in the constructor and pass this to the read pipeline as the singleton side input. + +Writing Data to Cloud Spanner. + +The WriteToSpanner transform writes to Cloud Spanner by executing a +collection a input rows (WriteMutation). The mutations are grouped into +batches for efficiency. + +WriteToSpanner transform relies on the WriteMutation objects which is exposed +by the SpannerIO API. WriteMutation have five static methods (insert, update, +insert_or_update, replace, delete). These methods returns the instance of the +_Mutator object which contains the mutation type and the Spanner Mutation +object. For more details, review the docs of the class SpannerIO.WriteMutation. +For example::: + + mutations = [ +WriteMutation.insert(table='user', columns=('name', 'email'), +values=[('sara'. 's...@dev.com')]) + ] + _ = (p + | beam.Create(mutations) + | WriteToSpanner( + project_id=SPANNER_PROJECT_ID, + instance_id=SPANNER_INSTANCE_ID, + database_id=SPANNER_DATABASE_NAME) +) + +You can also create WriteMutation via calling its constructor. For example::: + + mutations = [ + WriteMutation(insert='users', columns=('name', 'email'), +values=[('sara", 's...@example.com')]) + ] + +For more information, review the docs available on WriteMutation class. + +WriteToSpanner transform also takes 'max_batch_size_bytes' param which is set +to 1MB (1048576 bytes) by default. This parameter used to reduce the number of Review comment: Also, given that in this transform the batches are not pre-sorted, I would make the defaults a lot smaller than the Java equivalent: say max 500 cells per batch, and max 50 rows. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 382962) Time Spent: 17.5h (was: 17h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 17.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382958 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 16:06 Start Date: 06/Feb/20 16:06 Worklog Time Spent: 10m Work Description: nielm commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375926873 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -109,20 +111,74 @@ ReadFromSpanner takes this transform in the constructor and pass this to the read pipeline as the singleton side input. + +Writing Data to Cloud Spanner. + +The WriteToSpanner transform writes to Cloud Spanner by executing a +collection a input rows (WriteMutation). The mutations are grouped into +batches for efficiency. + +WriteToSpanner transform relies on the WriteMutation objects which is exposed +by the SpannerIO API. WriteMutation have five static methods (insert, update, +insert_or_update, replace, delete). These methods returns the instance of the +_Mutator object which contains the mutation type and the Spanner Mutation +object. For more details, review the docs of the class SpannerIO.WriteMutation. +For example::: + + mutations = [ +WriteMutation.insert(table='user', columns=('name', 'email'), +values=[('sara'. 's...@dev.com')]) + ] + _ = (p + | beam.Create(mutations) + | WriteToSpanner( + project_id=SPANNER_PROJECT_ID, + instance_id=SPANNER_INSTANCE_ID, + database_id=SPANNER_DATABASE_NAME) +) + +You can also create WriteMutation via calling its constructor. For example::: + + mutations = [ + WriteMutation(insert='users', columns=('name', 'email'), +values=[('sara", 's...@example.com')]) + ] + +For more information, review the docs available on WriteMutation class. + +WriteToSpanner transform also takes 'max_batch_size_bytes' param which is set +to 1MB (1048576 bytes) by default. This parameter used to reduce the number of Review comment: Yes, you are correct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 382958) Time Spent: 17h 20m (was: 17h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 17h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382957 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 16:05 Start Date: 06/Feb/20 16:05 Worklog Time Spent: 10m Work Description: nielm commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375926873 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -109,20 +111,74 @@ ReadFromSpanner takes this transform in the constructor and pass this to the read pipeline as the singleton side input. + +Writing Data to Cloud Spanner. + +The WriteToSpanner transform writes to Cloud Spanner by executing a +collection a input rows (WriteMutation). The mutations are grouped into +batches for efficiency. + +WriteToSpanner transform relies on the WriteMutation objects which is exposed +by the SpannerIO API. WriteMutation have five static methods (insert, update, +insert_or_update, replace, delete). These methods returns the instance of the +_Mutator object which contains the mutation type and the Spanner Mutation +object. For more details, review the docs of the class SpannerIO.WriteMutation. +For example::: + + mutations = [ +WriteMutation.insert(table='user', columns=('name', 'email'), +values=[('sara'. 's...@dev.com')]) + ] + _ = (p + | beam.Create(mutations) + | WriteToSpanner( + project_id=SPANNER_PROJECT_ID, + instance_id=SPANNER_INSTANCE_ID, + database_id=SPANNER_DATABASE_NAME) +) + +You can also create WriteMutation via calling its constructor. For example::: + + mutations = [ + WriteMutation(insert='users', columns=('name', 'email'), +values=[('sara", 's...@example.com')]) + ] + +For more information, review the docs available on WriteMutation class. + +WriteToSpanner transform also takes 'max_batch_size_bytes' param which is set +to 1MB (1048576 bytes) by default. This parameter used to reduce the number of Review comment: No, you are correct. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 382957) Time Spent: 17h 10m (was: 17h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 17h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382956 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 16:04 Start Date: 06/Feb/20 16:04 Worklog Time Spent: 10m Work Description: nielm commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375926449 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns:
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382939 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 15:14 Start Date: 06/Feb/20 15:14 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-582952187 > Just to let you know that we've just introduced Python autoformatter. Your merge conflict might be a result of this. > Here you can find an instruction on how to run autoformatter: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips, section Formatting. > Sorry for inconvenience. No worries @kamilwu , i'll resolve the conflicts on my next commit. Thanks for the heads up :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 382939) Time Spent: 16h 50m (was: 16h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382938&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382938 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 15:13 Start Date: 06/Feb/20 15:13 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375892684 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -109,20 +111,74 @@ ReadFromSpanner takes this transform in the constructor and pass this to the read pipeline as the singleton side input. + +Writing Data to Cloud Spanner. + +The WriteToSpanner transform writes to Cloud Spanner by executing a +collection a input rows (WriteMutation). The mutations are grouped into +batches for efficiency. + +WriteToSpanner transform relies on the WriteMutation objects which is exposed +by the SpannerIO API. WriteMutation have five static methods (insert, update, +insert_or_update, replace, delete). These methods returns the instance of the +_Mutator object which contains the mutation type and the Spanner Mutation +object. For more details, review the docs of the class SpannerIO.WriteMutation. +For example::: + + mutations = [ +WriteMutation.insert(table='user', columns=('name', 'email'), +values=[('sara'. 's...@dev.com')]) + ] + _ = (p + | beam.Create(mutations) + | WriteToSpanner( + project_id=SPANNER_PROJECT_ID, + instance_id=SPANNER_INSTANCE_ID, + database_id=SPANNER_DATABASE_NAME) +) + +You can also create WriteMutation via calling its constructor. For example::: + + mutations = [ + WriteMutation(insert='users', columns=('name', 'email'), +values=[('sara", 's...@example.com')]) + ] + +For more information, review the docs available on WriteMutation class. + +WriteToSpanner transform also takes 'max_batch_size_bytes' param which is set +to 1MB (1048576 bytes) by default. This parameter used to reduce the number of Review comment: For my understanding, `maximum_number_cells` would the (total number of columns * total number of rows) For Example: ``` WriteMutation.insert("roles", ("key", "rolename"), [('abc1', "test-1"), ('abc2', "test-2"), ('abc3', "test-3")]) ``` in this case... the max_number_cells would be `2 * 3 = 6` And for the max_rows_number will be 2 in the below case. ``` MutationGroup([ Â Â Â Â Â Â WriteMutation.insert("roles", ("key", "rolename"), Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [('abc1', "test1")]), Â Â Â Â Â Â WriteMutation.insert("roles", ("key", "rolename"), Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [('abc2', "test2")]) Â Â Â Â ]) ``` Please corrent me if I am mistaken! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 382938) Time Spent: 16h 40m (was: 16.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382937 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 15:13 Start Date: 06/Feb/20 15:13 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r375892462 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns: T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=382868&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-382868 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Feb/20 12:34 Start Date: 06/Feb/20 12:34 Worklog Time Spent: 10m Work Description: kamilwu commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-582885733 Just to let you know that we've just introduced Python autoformatter. Your merge conflict might be a result of this. Here you can find an instruction on how to run autoformatter: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips, section Formatting. Sorry for inconvenience. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 382868) Time Spent: 16h 20m (was: 16h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=381717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-381717 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 04/Feb/20 17:06 Start Date: 04/Feb/20 17:06 Worklog Time Spent: 10m Work Description: nielm commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r374802530 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns:
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=381716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-381716 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 04/Feb/20 17:02 Start Date: 04/Feb/20 17:02 Worklog Time Spent: 10m Work Description: nielm commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r374800163 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -109,20 +111,74 @@ ReadFromSpanner takes this transform in the constructor and pass this to the read pipeline as the singleton side input. + +Writing Data to Cloud Spanner. + +The WriteToSpanner transform writes to Cloud Spanner by executing a +collection a input rows (WriteMutation). The mutations are grouped into +batches for efficiency. + +WriteToSpanner transform relies on the WriteMutation objects which is exposed +by the SpannerIO API. WriteMutation have five static methods (insert, update, +insert_or_update, replace, delete). These methods returns the instance of the +_Mutator object which contains the mutation type and the Spanner Mutation +object. For more details, review the docs of the class SpannerIO.WriteMutation. +For example::: + + mutations = [ +WriteMutation.insert(table='user', columns=('name', 'email'), +values=[('sara'. 's...@dev.com')]) + ] + _ = (p + | beam.Create(mutations) + | WriteToSpanner( + project_id=SPANNER_PROJECT_ID, + instance_id=SPANNER_INSTANCE_ID, + database_id=SPANNER_DATABASE_NAME) +) + +You can also create WriteMutation via calling its constructor. For example::: + + mutations = [ + WriteMutation(insert='users', columns=('name', 'email'), +values=[('sara", 's...@example.com')]) + ] + +For more information, review the docs available on WriteMutation class. + +WriteToSpanner transform also takes 'max_batch_size_bytes' param which is set +to 1MB (1048576 bytes) by default. This parameter used to reduce the number of +transactions sent to spanner by grouping the mutation into batches. Setting +this either to smaller value or zero to disable batching. + Review comment: Please add a note to the following > Unlike the Java connector, this connector _does not_ create batches of transactions sorted by table and primary key. This can be a feature which is added later, I would not let it block this PR. See https://medium.com/google-cloud/cloud-spanner-maximizing-data-load-throughput-23a0fc064b6d for more info. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 381716) Time Spent: 16h (was: 15h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=381710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-381710 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 04/Feb/20 16:54 Start Date: 04/Feb/20 16:54 Worklog Time Spent: 10m Work Description: nielm commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r374795643 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -109,20 +111,74 @@ ReadFromSpanner takes this transform in the constructor and pass this to the read pipeline as the singleton side input. + +Writing Data to Cloud Spanner. + +The WriteToSpanner transform writes to Cloud Spanner by executing a +collection a input rows (WriteMutation). The mutations are grouped into +batches for efficiency. + +WriteToSpanner transform relies on the WriteMutation objects which is exposed +by the SpannerIO API. WriteMutation have five static methods (insert, update, +insert_or_update, replace, delete). These methods returns the instance of the +_Mutator object which contains the mutation type and the Spanner Mutation +object. For more details, review the docs of the class SpannerIO.WriteMutation. +For example::: + + mutations = [ +WriteMutation.insert(table='user', columns=('name', 'email'), +values=[('sara'. 's...@dev.com')]) + ] + _ = (p + | beam.Create(mutations) + | WriteToSpanner( + project_id=SPANNER_PROJECT_ID, + instance_id=SPANNER_INSTANCE_ID, + database_id=SPANNER_DATABASE_NAME) +) + +You can also create WriteMutation via calling its constructor. For example::: + + mutations = [ + WriteMutation(insert='users', columns=('name', 'email'), +values=[('sara", 's...@example.com')]) + ] + +For more information, review the docs available on WriteMutation class. + +WriteToSpanner transform also takes 'max_batch_size_bytes' param which is set +to 1MB (1048576 bytes) by default. This parameter used to reduce the number of Review comment: There is one other batching parameter which is important -- the maximum number of cells being mutated. Spanner has a hard 20K limit here, so a batch must have less than 20K mutated cells, including cells being mutated in indexes. Java version sets this to 5K by default. A third parameter max_number_rows was also added recently to java, limiting the total number of rows in a batch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 381710) Time Spent: 15h 50m (was: 15h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 15h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=381709&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-381709 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 04/Feb/20 16:46 Start Date: 04/Feb/20 16:46 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-582003286 cc: @nithinsujir and @nielm This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 381709) Time Spent: 15h 40m (was: 15.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 15h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379494&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379494 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 17:19 Start Date: 30/Jan/20 17:19 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-580360833 ping for test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 379494) Time Spent: 15.5h (was: 15h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 15.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379375 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 13:43 Start Date: 30/Jan/20 13:43 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-580259196 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 379375) Time Spent: 15h 20m (was: 15h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 15h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379316&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379316 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 10:57 Start Date: 30/Jan/20 10:57 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r372842745 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns: T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379318&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379318 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 10:57 Start Date: 30/Jan/20 10:57 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r372831190 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns: T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379315&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379315 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 10:57 Start Date: 30/Jan/20 10:57 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r372828128 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns: T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379319&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379319 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 10:57 Start Date: 30/Jan/20 10:57 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r372845556 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -581,3 +644,369 @@ def display_data(self): label='transaction') return res + + +@experimental(extra_message="No backwards-compatibility guarantees.") +class WriteToSpanner(PTransform): + + def __init__(self, project_id, instance_id, database_id, pool=None, + credentials=None, max_batch_size_bytes=1048576): +""" +A PTransform to write onto Google Cloud Spanner. + +Args: + project_id: Cloud spanner project id. Be sure to use the Project ID, +not the Project Number. + instance_id: Cloud spanner instance id. + database_id: Cloud spanner database id. + max_batch_size_bytes: (optional) Split the mutation into batches to +reduce the number of transaction sent to Spanner. By default it is +set to 1 MB (1048576 Bytes). +""" +self._configuration = _BeamSpannerConfiguration( +project=project_id, instance=instance_id, database=database_id, +credentials=credentials, pool=pool, snapshot_read_timestamp=None, +snapshot_exact_staleness=None +) +self._max_batch_size_bytes = max_batch_size_bytes +self._database_id = database_id +self._project_id = project_id +self._instance_id = instance_id +self._pool = pool + + def display_data(self): +res = { +'project_id': DisplayDataItem(self._project_id, label='Project Id'), +'instance_id': DisplayDataItem(self._instance_id, label='Instance Id'), +'pool': DisplayDataItem(str(self._pool), label='Pool'), +'database': DisplayDataItem(self._database_id, label='Database'), +'batch_size': DisplayDataItem(self._max_batch_size_bytes, + label="Batch Size"), +} +return res + + def expand(self, pcoll): +return (pcoll +| "make batches" >> +_WriteGroup(max_batch_size_bytes=self._max_batch_size_bytes) +| 'Writing to spanner' >> ParDo( +_WriteToSpannerDoFn(self._configuration))) + + +class _Mutator(namedtuple('_Mutator', ["mutation", "operation", "kwargs"])): + __slots__ = () + + @property + def byte_size(self): +return self.mutation.ByteSize() + + +class MutationGroup(deque): + """ + A Bundle of Spanner Mutations (_Mutator). + """ + + @property + def byte_size(self): +s = 0 +for m in self.__iter__(): + s += m.byte_size +return s + + def primary(self): +return next(self.__iter__()) + + +class WriteMutation(object): + + _OPERATION_DELETE = "delete" + _OPERATION_INSERT = "insert" + _OPERATION_INSERT_OR_UPDATE = "insert_or_update" + _OPERATION_REPLACE = "replace" + _OPERATION_UPDATE = "update" + + def __init__(self, + insert=None, + update=None, + insert_or_update=None, + replace=None, + delete=None, + columns=None, + values=None, + keyset=None): +""" +A convenient class to create Spanner Mutations for Write. User can provide +the operation via constructor or via static methods. + +Note: If a user passing the operation via construction, make sure that it +will only accept one operation at a time. For example, if a user passing +a table name in the `insert` parameter, and he also passes the `update` +parameter value, this will cause an error. + +Args: + insert: (Optional) Name of the table in which rows will be inserted. + update: (Optional) Name of the table in which existing rows will be +updated. + insert_or_update: (Optional) Table name in which rows will be written. +Like insert, except that if the row already exists, then its column +values are overwritten with the ones provided. Any column values not +explicitly written are preserved. + replace: (Optional) Table name in which rows will be replaced. Like +insert, except that if the row already exists, it is deleted, and the +column values provided are inserted instead. Unlike `insert_or_update`, +this means any values not explicitly written become `NULL`. + delete: (Optional) Table name from which rows will be deleted. Succeeds +whether or not the named rows were present. + columns: T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379317 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 10:57 Start Date: 30/Jan/20 10:57 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#discussion_r372833189 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -131,13 +187,18 @@ try: from google.cloud.spanner import Client from google.cloud.spanner import KeySet + from google.cloud.spanner_v1 import batch from google.cloud.spanner_v1.database import BatchSnapshot + from google.cloud.spanner_v1.proto.mutation_pb2 import Mutation Review comment: Since spanner package wont expose the `Mutation` and `batch` objects, so this is the only way to import it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 379317) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 14h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379286&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379286 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 10:22 Start Date: 30/Jan/20 10:22 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-580185708 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 379286) Time Spent: 14h 20m (was: 14h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 14h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379288&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379288 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 10:22 Start Date: 30/Jan/20 10:22 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-580185862 Run Python Precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 379288) Time Spent: 14.5h (was: 14h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 14.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=379218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379218 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 30/Jan/20 07:37 Start Date: 30/Jan/20 07:37 Worklog Time Spent: 10m Work Description: mszb commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-580122678 R: @chamikaramj R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 379218) Time Spent: 14h 10m (was: 14h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 14h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378965 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 29/Jan/20 18:28 Start Date: 29/Jan/20 18:28 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-579895160 ping for tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 378965) Time Spent: 14h (was: 13h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 14h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378956 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 29/Jan/20 18:08 Start Date: 29/Jan/20 18:08 Worklog Time Spent: 10m Work Description: mszb commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCom
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378561 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 28/Jan/20 23:26 Start Date: 28/Jan/20 23:26 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10706: [BEAM-7246] Fix Spanner auth endpoints URL: https://github.com/apache/beam/pull/10706 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 378561) Time Spent: 13h 40m (was: 13.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 13h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378524&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378524 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 28/Jan/20 22:19 Start Date: 28/Jan/20 22:19 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10706: [BEAM-7246] Fix Spanner auth endpoints URL: https://github.com/apache/beam/pull/10706#issuecomment-579487896 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 378524) Time Spent: 13h 20m (was: 13h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 13h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378525&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378525 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 28/Jan/20 22:19 Start Date: 28/Jan/20 22:19 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10706: [BEAM-7246] Fix Spanner auth endpoints URL: https://github.com/apache/beam/pull/10706#issuecomment-579487946 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 378525) Time Spent: 13.5h (was: 13h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 13.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378523&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378523 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 28/Jan/20 22:18 Start Date: 28/Jan/20 22:18 Worklog Time Spent: 10m Work Description: mszb commented on issue #10706: [BEAM-7246] Fix Spanner auth endpoints URL: https://github.com/apache/beam/pull/10706#issuecomment-579487332 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 378523) Time Spent: 13h 10m (was: 13h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 13h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378494&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378494 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 28/Jan/20 21:49 Start Date: 28/Jan/20 21:49 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10706: [BEAM-7246] Fix Spanner auth endpoints URL: https://github.com/apache/beam/pull/10706#issuecomment-579473549 cc: @mszb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 378494) Time Spent: 13h (was: 12h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 13h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=378459&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-378459 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 28/Jan/20 20:59 Start Date: 28/Jan/20 20:59 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10706: [BEAM-7246] Fix Spanner auth endpoints URL: https://github.com/apache/beam/pull/10706 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373868&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373868 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 20:55 Start Date: 17/Jan/20 20:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373868) Time Spent: 12.5h (was: 12h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 12.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373869&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373869 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 20:55 Start Date: 17/Jan/20 20:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575791689 Thank you. Let's get integration tests in so that we can move this out of experimental :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373869) Time Spent: 12h 40m (was: 12.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 12h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373866&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373866 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 20:36 Start Date: 17/Jan/20 20:36 Worklog Time Spent: 10m Work Description: shehzaadn-vd commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575785491 Thanks @chamikaramj for your support! @aaltay looks like the tests are passing. Would you be able to merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373866) Time Spent: 12h 20m (was: 12h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 12h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373821&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373821 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 19:38 Start Date: 17/Jan/20 19:38 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575766434 LGTM. Thanks. We can get this in when tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373821) Time Spent: 12h 10m (was: 12h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 12h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373814 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 19:29 Start Date: 17/Jan/20 19:29 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575763166 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373814) Time Spent: 11h 50m (was: 11h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 11h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373815&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373815 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 19:29 Start Date: 17/Jan/20 19:29 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575763362 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373815) Time Spent: 12h (was: 11h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373603&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373603 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 13:41 Start Date: 17/Jan/20 13:41 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575630301 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373603) Time Spent: 11.5h (was: 11h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 11.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373604&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373604 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 13:41 Start Date: 17/Jan/20 13:41 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575630131 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373604) Time Spent: 11h 40m (was: 11.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 11h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373602&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373602 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 13:41 Start Date: 17/Jan/20 13:41 Worklog Time Spent: 10m Work Description: iemejia commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575630131 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373602) Time Spent: 11h 20m (was: 11h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373547 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 11:13 Start Date: 17/Jan/20 11:13 Worklog Time Spent: 10m Work Description: mszb commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575584369 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373547) Time Spent: 11h 10m (was: 11h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 11h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=373384&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-373384 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 17/Jan/20 03:51 Start Date: 17/Jan/20 03:51 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-575452714 Any updates ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 373384) Time Spent: 11h (was: 10h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 11h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=367172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-367172 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 07/Jan/20 02:19 Start Date: 07/Jan/20 02:19 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-571402431 Thanks. Mostly looks good. Added few more comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 367172) Time Spent: 10h 50m (was: 10h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=367170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-367170 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 07/Jan/20 02:19 Start Date: 07/Jan/20 02:19 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r363567042 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio_test.py ## @@ -0,0 +1,271 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import datetime +import logging +import random +import string +import unittest + +import mock + +import apache_beam as beam +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to + +# Protect against environments where spanner library is not available. +# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports +try: + from google.cloud import spanner + from apache_beam.io.gcp.experimental.spannerio import (create_transaction, + ReadOperation, + ReadFromSpanner) # pylint: disable=unused-import + # disable=unused-import +except ImportError: + spanner = None +# pylint: enable=wrong-import-order, wrong-import-position, ungrouped-imports + + +MAX_DB_NAME_LENGTH = 30 +TEST_PROJECT_ID = 'apache-beam-testing' +TEST_INSTANCE_ID = 'beam-test' +TEST_DATABASE_PREFIX = 'spanner-testdb-' +# TEST_TABLE = 'users' +# TEST_COLUMNS = ['Key', 'Value'] +FAKE_ROWS = [[1, 'Alice'], [2, 'Bob'], [3, 'Carl'], [4, 'Dan'], [5, 'Evan'], + [6, 'Floyd']] + + +def _generate_database_name(): + mask = string.ascii_lowercase + string.digits + length = MAX_DB_NAME_LENGTH - 1 - len(TEST_DATABASE_PREFIX) + return TEST_DATABASE_PREFIX + ''.join(random.choice(mask) for i in range( + length)) + + +def _generate_test_data(): + mask = string.ascii_lowercase + string.digits + length = 100 + return [('users', ['Key', 'Value'], [(x, ''.join( + random.choice(mask) for _ in range(length))) for x in range(1, 5)])] + + +@unittest.skipIf(spanner is None, 'GCP dependencies are not installed.') +@mock.patch('apache_beam.io.gcp.experimental.spannerio.Client') +@mock.patch('apache_beam.io.gcp.experimental.spannerio.BatchSnapshot') +class SpannerReadTest(unittest.TestCase): + + def test_read_with_query_batch(self, mock_batch_snapshot_class, Review comment: How about runReadUsingIndex ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 367170) Time Spent: 10h 40m (was: 10.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 10h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=367169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-367169 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 07/Jan/20 02:18 Start Date: 07/Jan/20 02:18 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r363566806 ## File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py ## @@ -0,0 +1,565 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +Experimental; no backwards-compatibility guarantees. + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +ReadFromSpanner relies on the ReadOperation objects which is exposed by the +SpannerIO API. ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of ReadOperations +to the ReadFromSpanner transform constructor. ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + ReadOperation.table(table='customers', columns=['name', + 'email']), + ReadOperation.table(table='vendors', columns=['name', + 'email']), +] + all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + ReadOperation.query(sql='Select name, email from + customers'), + ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner.param_types` + all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class ReadOperation. + +User can also able to provide the ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([ReadOperation...]) + | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns PTransform +which is passed later to the `ReadFromSpanner` constructor. `ReadFromSpanner` +pass this transaction PTransform as a singleton side input to the +`_NaiveSpannerReadDoFn` containing 'session_id' and 'transaction_id'. +For example::: + + transaction = (pipeline | create_transaction(T
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=367157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-367157 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 07/Jan/20 01:58 Start Date: 07/Jan/20 01:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r363563133 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sq
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=366843&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-366843 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 06/Jan/20 18:44 Start Date: 06/Jan/20 18:44 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-571259347 Still reviewing the latest round of updates. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 366843) Time Spent: 10h 10m (was: 10h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 10h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=365803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365803 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 03/Jan/20 10:44 Start Date: 03/Jan/20 10:44 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361167119 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,559 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +ReadFromSpanner relies on the ReadOperation objects which is exposed by the +SpannerIO API. ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more Review comment: Yes, you are right. In Naive reads transform we do not use spanner partitioning query in the transform. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 365803) Time Spent: 10h (was: 9h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 10h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=365802&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365802 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 03/Jan/20 10:43 Start Date: 03/Jan/20 10:43 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361167119 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,559 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +ReadFromSpanner relies on the ReadOperation objects which is exposed by the +SpannerIO API. ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more Review comment: Yes, you are right. In Navie reads transform we do not use spanner partitioning query in the transform. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 365802) Time Spent: 9h 50m (was: 9h 40m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 9h 50m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=365549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-365549 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 03/Jan/20 00:09 Start Date: 03/Jan/20 00:09 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#issuecomment-570409375 @chamikaramj is this ready to be merged? Are all the open comments resolved? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 365549) Time Spent: 9h 40m (was: 9.5h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363651&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363651 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 16:30 Start Date: 26/Dec/19 16:30 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361489358 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` Review comment: Yes, its doing the same thing... updated the docs with some more details. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, plea
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363650 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 16:28 Start Date: 26/Dec/19 16:28 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361489188 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363636 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 16:07 Start Date: 26/Dec/19 16:07 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361485151 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 363636) Time Spent: 9h 10m (was: 9h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363634&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363634 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 16:04 Start Date: 26/Dec/19 16:04 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361484541 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` Review comment: I've updated the code. Now it referenced to `google.cloud.spanner.param_types`. But there is one import (`google.cloud.spanner_v1.database.BatchSnapshot`) which we need in our pipeline. Unfortunately, spanner sdk does not have its alias set in the package so the only option we have is to import is via version-specific. https://github.com/googleapis/google-cloud-python/blob/master/spanner/google/cloud/spanner.py This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 363634) Time Spent: 9h (was: 8h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner fo
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363628 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:51 Start Date: 26/Dec/19 15:51 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361482053 ## File path: sdks/python/apache_beam/io/gcp/spannerio_test.py ## @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import absolute_import + +import datetime +import logging +import random +import string +import unittest + +import mock + +import apache_beam as beam +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.util import assert_that Review comment: Yes, i took the references from `org.apache.beam.sdk.io.gcp.spanner.SpannerIOReadTest` and implementing them in pythonic way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 363628) Time Spent: 8.5h (was: 8h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363629&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363629 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:51 Start Date: 26/Dec/19 15:51 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361482073 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363630 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:51 Start Date: 26/Dec/19 15:51 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361482117 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363627 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:50 Start Date: 26/Dec/19 15:50 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361481918 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363625 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:46 Start Date: 26/Dec/19 15:46 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361481198 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 363625) Time Spent: 8h (was: 7h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363626 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:46 Start Date: 26/Dec/19 15:46 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361481286 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 363626) Time Spent: 8h 10m (was: 8h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363624 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:45 Start Date: 26/Dec/19 15:45 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361481076 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363622 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:45 Start Date: 26/Dec/19 15:45 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361481011 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363621 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:45 Start Date: 26/Dec/19 15:45 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361480936 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363620 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:44 Start Date: 26/Dec/19 15:44 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361480838 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363618 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:43 Start Date: 26/Dec/19 15:43 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361480704 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363615 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:42 Start Date: 26/Dec/19 15:42 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361480373 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363616 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:42 Start Date: 26/Dec/19 15:42 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361480373 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363614 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:41 Start Date: 26/Dec/19 15:41 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361480318 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363613 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:39 Start Date: 26/Dec/19 15:39 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361479748 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363612 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:37 Start Date: 26/Dec/19 15:37 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361479506 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,558 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply _ReadFromSpanner transformation. It will +return a PCollection, where each element represents an individual row returned +from the read operation. Both Query and Read APIs are supported. + +_ReadFromSpanner relies on the _ReadOperation objects which is exposed by the +SpannerIO API. _ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on Cloud Spanner. This is done for more +convenient programming. + +_ReadFromSpanner reads from Cloud Spanner by providing either an 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also perform multiple reads by providing a list of _ReadOperations +to the _ReadFromSpanner transform constructor. _ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + _ReadOperation.table('customers', ['name', 'email']), + _ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + _ReadOperation.query('Select name, email from customers'), + _ReadOperation.query( +sql='Select * from users where id <= @user_id', +params={'user_id': 100}, +params_type={'user_id': param_types.INT64} + ), +] + # `params_types` are instance of `google.cloud.spanner_v1.param_types` + all_users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class _ReadOperation. + +User can also able to provide the _ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([_ReadOperation...]) + | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`_create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `_create_transaction` +PTransform later passed to the constructor of _ReadFromSpanner. For example::: + + transaction = (pipeline | _create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | _ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Sele
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=363610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-363610 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 26/Dec/19 15:36 Start Date: 26/Dec/19 15:36 Worklog Time Spent: 10m Work Description: mszb commented on pull request #9606: [BEAM-7246] Add Google Spanner IO Read on Python SDK URL: https://github.com/apache/beam/pull/9606#discussion_r361479242 ## File path: sdks/python/apache_beam/io/gcp/spannerio.py ## @@ -0,0 +1,531 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Google Cloud Spanner IO + +This is an experimental module for reading and writing data from Google Cloud +Spanner. Visit: https://cloud.google.com/spanner for more details. + +To read from Cloud Spanner apply ReadFromSpanner transformation. It will +return a list, where each element represents an individual row returned from +the read operation. Both Query and Read APIs are supported. + +ReadFromSpanner relies on the ReadOperation objects which is exposed by the +SpannerIO API. ReadOperation holds the immutable data which is responsible to +execute batch and naive reads on spanner cloud. This is done for more +convenient programming. + +ReadFromSpanner read from cloud spanner by providing the either 'sql' param +in the constructor or 'table' name with 'columns' as list. For example::: + + records = (pipeline +| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users')) + + records = (pipeline +| ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +table='users', columns=['id', 'name', 'email'])) + +You can also performs the multiple reads by provide the list of ReadOperation +to the ReadFromSpanner transform constructor. ReadOperation exposes two static +methods. Use 'query' to perform sql based reads, 'table' to perform read from +table name. For example::: + + read_operations = [ + ReadOperation.table('customers', ['name', 'email']), + ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + + ...OR... + + read_operations = [ + ReadOperation.sql('Select name, email from customers'), + ReadOperation.table('vendors', ['name', 'email']), +] + all_users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +read_operations=read_operations) + +For more information, please review the docs on class ReadOperation. + +User can also able to provide the ReadOperation in form of PCollection via +pipeline. For example::: + + users = (pipeline + | beam.Create([ReadOperation...]) + | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME)) + +User may also create cloud spanner transaction from the transform called +`create_transaction` which is available in the SpannerIO API. + +The transform is guaranteed to be executed on a consistent snapshot of data, +utilizing the power of read only transactions. Staleness of data can be +controlled by providing the `read_timestamp` or `exact_staleness` param values +in the constructor. + +This transform requires root of the pipeline (PBegin) and returns the dict +containing 'session_id' and 'transaction_id'. This `create_transaction` +PTransform later passed to the constructor of ReadFromSpanner. For example::: + + transaction = (pipeline | create_transaction(TEST_PROJECT_ID, + TEST_INSTANCE_ID, + DB_NAME)) + + users = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from users', transaction=transaction) + + tweets = pipeline | ReadFromSpanner(PROJECT_ID, INSTANCE_ID, DB_NAME, +sql='Select * from tweets', transaction=transaction) + +For further details of this transform, please review the docs on the +`create_transaction` method available in