[ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=381710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-381710
 ]

ASF GitHub Bot logged work on BEAM-7246:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Feb/20 16:54
            Start Date: 04/Feb/20 16:54
    Worklog Time Spent: 10m 
      Work Description: nielm commented on pull request #10712: [BEAM-7246] 
Added Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#discussion_r374795643
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/experimental/spannerio.py
 ##########
 @@ -109,20 +111,74 @@
 
 ReadFromSpanner takes this transform in the constructor and pass this to the
 read pipeline as the singleton side input.
+
+Writing Data to Cloud Spanner.
+
+The WriteToSpanner transform writes to Cloud Spanner by executing a
+collection a input rows (WriteMutation). The mutations are grouped into
+batches for efficiency.
+
+WriteToSpanner transform relies on the WriteMutation objects which is exposed
+by the SpannerIO API. WriteMutation have five static methods (insert, update,
+insert_or_update, replace, delete). These methods returns the instance of the
+_Mutator object which contains the mutation type and the Spanner Mutation
+object. For more details, review the docs of the class SpannerIO.WriteMutation.
+For example:::
+
+  mutations = [
+                WriteMutation.insert(table='user', columns=('name', 'email'),
+                values=[('sara'. 's...@dev.com')])
+              ]
+  _ = (p
+       | beam.Create(mutations)
+       | WriteToSpanner(
+          project_id=SPANNER_PROJECT_ID,
+          instance_id=SPANNER_INSTANCE_ID,
+          database_id=SPANNER_DATABASE_NAME)
+        )
+
+You can also create WriteMutation via calling its constructor. For example:::
+
+  mutations = [
+      WriteMutation(insert='users', columns=('name', 'email'),
+                    values=[('sara", 's...@example.com')])
+  ]
+
+For more information, review the docs available on WriteMutation class.
+
+WriteToSpanner transform also takes 'max_batch_size_bytes' param which is set
+to 1MB (1048576 bytes) by default. This parameter used to reduce the number of
 
 Review comment:
   There is one other batching parameter which is important -- the maximum 
number of cells being mutated. Spanner has a hard 20K limit here, so a batch 
must have less than 20K mutated cells, including cells being mutated in 
indexes. 
   
   Java version sets this to 5K by default. 
   
   A third parameter max_number_rows was also added recently to java, limiting 
the total number of rows in a batch.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 381710)
    Time Spent: 15h 50m  (was: 15h 40m)

> Create a Spanner IO for Python
> ------------------------------
>
>                 Key: BEAM-7246
>                 URL: https://issues.apache.org/jira/browse/BEAM-7246
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Reuven Lax
>            Assignee: Shehzaad Nakhoda
>            Priority: Major
>          Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to