[ https://issues.apache.org/jira/browse/BEAM-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940462#comment-15940462 ]
Mike Lambert commented on BEAM-1800: ------------------------------------ Okay, got server-side errors (instead of client-side errors) when attempting to run client.put(obj) directly...so it actually was getting further along. The server-side errors were as follows, and due to my own code: {noinput} BadRequest: 400 list_value cannot contain a Value containing another list_value. # needed to serialize my json into a string BadRequest: 400 The value of property "top_people_json" is longer than 1500 bytes. # needed to exclude the json-string field from indexing {noinput} Fixing them causes the client.put(obj) approach to run successfully. So there was something wrong with my WriteToDatastore batching (or my use of it). > Can't save datastore objects > ---------------------------- > > Key: BEAM-1800 > URL: https://issues.apache.org/jira/browse/BEAM-1800 > Project: Beam > Issue Type: Bug > Components: sdk-py > Reporter: Mike Lambert > Assignee: Ahmet Altay > > I can't seem to save my database objects using {{WriteToDatastore}}, as it > errors out on a strange unicode issue when trying to write a batch. > Stacktrace follows: > {noformat} > File "apache_beam/runners/common.py", line 195, in > apache_beam.runners.common.DoFnRunner.receive > (apache_beam/runners/common.c:5142) > self.process(windowed_value) > File "apache_beam/runners/common.py", line 267, in > apache_beam.runners.common.DoFnRunner.process > (apache_beam/runners/common.c:7201) > self.reraise_augmented(exn) > File "apache_beam/runners/common.py", line 279, in > apache_beam.runners.common.DoFnRunner.reraise_augmented > (apache_beam/runners/common.c:7590) > raise type(exn), args, sys.exc_info()[2] > File "apache_beam/runners/common.py", line 263, in > apache_beam.runners.common.DoFnRunner.process > (apache_beam/runners/common.c:7090) > self._dofn_simple_invoker(element) > File "apache_beam/runners/common.py", line 198, in > apache_beam.runners.common.DoFnRunner._dofn_simple_invoker > (apache_beam/runners/common.c:5262) > self._process_outputs(element, self.dofn_process(element.value)) > File > "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py", > line 354, in process > self._flush_batch() > File > "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py", > line 363, in _flush_batch > helper.write_mutations(self._datastore, self._project, self._mutations) > File > "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py", > line 187, in write_mutations > commit(commit_request) > File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", > line 174, in wrapper > return fun(*args, **kwargs) > File > "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py", > line 185, in commit > datastore.commit(req) > File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", > line 140, in commit > datastore_pb2.CommitResponse) > File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", > line 199, in _call_method > method='POST', body=payload, headers=headers) > File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line > 631, in new_request > redirections, connection_type) > File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line > 1609, in request (response, content) > = self._request(conn, authority, uri, request_uri, method, body, headers, > redirections, cachekey) > File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line > 1351, in _request (response, content) > = self._conn_request(conn, request_uri, method, body, headers) > File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line > 1273, in _conn_request > conn.request(method, request_uri, body, headers) > File "/usr/lib/python2.7/httplib.py", line 1039, in request > self._send_request(method, url, body, headers) > File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request > self.endheaders(body) > File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders > self._send_output(message_body) > File "/usr/lib/python2.7/httplib.py", line 877, in _send_output > msg += message_body TypeError: must be str, not unicode > [while running 'write to datastore/Convert to Mutation'] > {noformat} > My code is basically: > {noformat} > | 'convert from entity' >> beam.Map(ConvertFromEntity) > | 'write to datastore' >> WriteToDatastore(client.project) > {noformat} > Where {{ConvertFromEntity}} converts from a google.cloud.datastore object > (which has a nice API/interface) into the underlying protobuf (which is what > the beam gcp/datastore library expects): > {noformat} > from google.cloud.datastore import helpers > def ConvertFromEntity(entity): > return helpers.entity_to_protobuf(entity) > {noformat} > I assume entity_to_protobuf works fine/normally, since it's also what is used > by {{google/cloud/datastore/batch.py}} to write a bunch of > {{entity_pb2.Entity}} objects into the > {{datastore_pb2.CommitRequest.mutations[n].upsert}}: > In batch.py: {{put() -> _assign_entity_to_pb() -> entity_to_protobuf()}}. > In datastoreio.py: > {{WriteToDatastore->DatastoreWriteFn.to_upsert_mutation->_Mutate.DatastoreWriteFn->helper.write_mutations}} > Any idea what's going on here and why this doesn't work? Yes, I may have some > unicode in my objects...but it works in my appengine DB/NDB usage. I will > attempt to skip WriteToDatastore and just put unbatched entities using the > datastore library and see if that goes any better for me... -- This message was sent by Atlassian JIRA (v6.3.15#6346)