it sounds like you have some write transactions in P1 that never actually 
committed.  those then got backed up and restored to the second project.  
is that possible?  if so then i'm sure you next question is how to detect 
the missed writes.....to which i don't have a great answer. :(

cfh

On Monday, April 11, 2016 at 9:54:04 AM UTC-7, Anastasios Hatzis wrote:
>
> Hi, please let me know if this question is better suited for the dedicated 
> Google Cloud Datastore group or some other online resource. However, I use 
> Datastore in combination with GAE Python apps.
>
> This week-end I have migrated one of my production apps and now have 
> received user reports about some stale data in datastore they have 
> discovered, while their corresponding documents in Search API are 
> up-to-date. I'm still looking into the issue, but it seems that the 
> datastore somehow jumped back in time for a few entities of a specific 
> kind, maybe 0.5%, most of them where originally created in the same 
> time-span of a few weeks in late November/early December 2015, though, not 
> all of them in this time-span shown this issue.
>
> Migration steps:
>
>    1. Created a new project P2 (in EU)
>    2. Deployed the Python code (version V1) to P2 with appcfg.py and 
>    waited until all Datastore indexes were shown as "serving"
>    3. Datastore backup in project P1 (in US), as of April 9th
>       1. disabled writes for the datastore
>       2. created a backup (all namespaces, all kinds, including a 
>       "_DeferredTaskEntity"), using Cloud Console's "Datastore Admin" page, 
> as 
>       usual stored in my GCS bucket for backups
>    4. In project P2, again with "Datastore Admin" page, right after 
>    backup in P1 completed:
>       1. disabled writes the almost empty datastore (none of them of the 
>       kind that has shown the issue later)
>       2. imported the same backup information from the backup bucket, and 
>       restored into P2's datastore, again: all kinds, all namespaces
>       3. when the restore tasks were completed, I enabled datastore writes
>    5. Deployed the Python code (version V2) to P2 and did run a batch 
>    handler that changed a property value of all entities, where each entity's 
>    version counter is increased +1, the updated timestamp changes 
>    automatically, and the corresponding search doc is updated, too.
>    6. For Search API of P2: wiped all documents from all indexes in 
>    Search API (just in case); when wipe tasks completed, queried the 
>    datastore entities and wrote excerpts of them as search documents
>
> Interestingly, for the effected entities of that kind, the corresponding 
> search doc in Search API has more recent data than the original entity in 
> datastore.
>
> Datastore Entity in P1 and P2:
>
>    - version counter: *8*
>    - last update on: 2016-*02*-15
>    - status: '*executing*'
>
> Search document of this entity in P1 and P2:
> (search doc ID is always the URLsafe encoded NDB key, and I can tell from 
> all other fields/properties, it is the correct search doc)
>
>    - version counter: *13*
>    - last update on: 2016-*03*-15
>    - status: '*completed*'
>
>
> In P1 I had expected, that the entity has the same data than its search 
> doc, but in fact was stale.
>
> In P2, I have expected for both, entity and search doc:
>
>    - version counter: *14*
>    - last update on: 2016-*04-10*
>    - status: '*completed*'
>
> because of the migration script that updated one property for all entities 
> in this kind, and should also have triggered an update of the search doc.
>
> There are two observations:
>
>    1. *P1's entity already had stale data*, older than the search doc. 
>    This could be explained with an inconsistent / failed write to the 
>    datastore, at least in theory. The app uses transactions for 
>    reading/writing of this kind. In _post_put_hook(), *if 
>    future.check_success() is None*, the search doc is written/updated. I 
>    can think of exotic situations where the search doc could be older than 
> its 
>    original entity in datastore, but since the datastore write happens in a 
>    transaction, and the search export happens only with a successful write 
>    operation, I fail to explain how the entity in datastore could prevail the 
>    change (or revert to an older version). We talk about 5 different types of 
>    changes during one month that have all been lost. There are also no 
>    deferred tasks that write potentially old entities back into the datastore.
>    2. *P2's entity again shows stale data,* older than the search doc. 
>    This is particularly confusing, because the search doc is only written 
> with 
>    the data read from datastore. And since the search docs were not copied 
>    from P1, the only source was the data freshly restored from the P1 backup. 
>    Although, if I look into the P1 datastore, as shown above, the data is 
>    already stale. Where did P2's datastore then get the new data from? So 
>    while the batch handler was running, the datastore had the data of version 
>    counter 13, but at some point after writing the search doc, the datastore 
>    reverted the entity to version counter 8. However, all the datastore 
> writes 
>    for this entity have happened long time ago in the original datastore of 
>    P1. So, it looks to me, that the datastore in P2 somehow got both data for 
>    this entity, version counter 8 *and* version counter 13. Wouldn't this 
>    imply that the backup data could contain multiple versions of the same 
>    entity, or could there be another leak that works across projects? And for 
>    some reason, after the version counter 13 data was written to search docs, 
>    the entity got reverted to version counter 8. 
>
> I'm running out of possible explanations for this, other than Datastore is 
> able to have multiple versions of the same entity and those are even part 
> of a backup.
>
> Paint me confused :) However, maybe you have any idea what could cause 
> this.
>
> Ani
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/c0225957-c9fa-4e61-b2e7-878a4a9d59c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to