_DeferredTaskEntity is used for storing the payload of tasks that are 
enqueued with deferred.defer() 
<https://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/ext/deferred/deferred.py#240>,
 
which exceed 100KB <https://cloud.google.com/appengine/articles/deferred>. 
The data is only loaded by the task handler that was created along with the 
entity, so there is no chance that loading these entities from a backup 
would spawn new tasks or otherwise cause data to be overwritten.

On Monday, April 11, 2016 at 3:54:51 PM UTC-4, Anastasios Hatzis wrote:
>
> Christian, good thinking. And yes, you got pre-cog skill on max ;-) 
> Exactly that would be my next question.
>
> I wonder, if *_DeferredTaskEntity* (which I included in my 
> backup/restore) isn't about deferred tasks, as in task-queue, which 
> actually were all empty at the time of backup, but Datastore Admin showed 1 
> object of this kind with 704 KBytes. So, maybe this kind is for ancient 
> pending transactions, which after the restore and the export to search 
> docs, were finally executed. That wouldn't sound as crazy as my other ideas.
>
>
> On Monday, April 11, 2016 at 9:20:58 PM UTC+2, Christian F. Howes wrote:
>>
>> it sounds like you have some write transactions in P1 that never actually 
>> committed.  those then got backed up and restored to the second project.  
>> is that possible?  if so then i'm sure you next question is how to detect 
>> the missed writes.....to which i don't have a great answer. :(
>>
>> cfh
>>
>> On Monday, April 11, 2016 at 9:54:04 AM UTC-7, Anastasios Hatzis wrote:
>>>
>>> Hi, please let me know if this question is better suited for the 
>>> dedicated Google Cloud Datastore group or some other online resource. 
>>> However, I use Datastore in combination with GAE Python apps.
>>>
>>> This week-end I have migrated one of my production apps and now have 
>>> received user reports about some stale data in datastore they have 
>>> discovered, while their corresponding documents in Search API are 
>>> up-to-date. I'm still looking into the issue, but it seems that the 
>>> datastore somehow jumped back in time for a few entities of a specific 
>>> kind, maybe 0.5%, most of them where originally created in the same 
>>> time-span of a few weeks in late November/early December 2015, though, not 
>>> all of them in this time-span shown this issue.
>>>
>>> Migration steps:
>>>
>>>    1. Created a new project P2 (in EU)
>>>    2. Deployed the Python code (version V1) to P2 with appcfg.py and 
>>>    waited until all Datastore indexes were shown as "serving"
>>>    3. Datastore backup in project P1 (in US), as of April 9th
>>>       1. disabled writes for the datastore
>>>       2. created a backup (all namespaces, all kinds, including a 
>>>       "_DeferredTaskEntity"), using Cloud Console's "Datastore Admin" page, 
>>> as 
>>>       usual stored in my GCS bucket for backups
>>>    4. In project P2, again with "Datastore Admin" page, right after 
>>>    backup in P1 completed:
>>>       1. disabled writes the almost empty datastore (none of them of 
>>>       the kind that has shown the issue later)
>>>       2. imported the same backup information from the backup bucket, 
>>>       and restored into P2's datastore, again: all kinds, all namespaces
>>>       3. when the restore tasks were completed, I enabled datastore 
>>>       writes
>>>    5. Deployed the Python code (version V2) to P2 and did run a batch 
>>>    handler that changed a property value of all entities, where each 
>>> entity's 
>>>    version counter is increased +1, the updated timestamp changes 
>>>    automatically, and the corresponding search doc is updated, too.
>>>    6. For Search API of P2: wiped all documents from all indexes in 
>>>    Search API (just in case); when wipe tasks completed, queried the 
>>>    datastore entities and wrote excerpts of them as search documents
>>>
>>> Interestingly, for the effected entities of that kind, the corresponding 
>>> search doc in Search API has more recent data than the original entity in 
>>> datastore.
>>>
>>> Datastore Entity in P1 and P2:
>>>
>>>    - version counter: *8*
>>>    - last update on: 2016-*02*-15
>>>    - status: '*executing*'
>>>
>>> Search document of this entity in P1 and P2:
>>> (search doc ID is always the URLsafe encoded NDB key, and I can tell 
>>> from all other fields/properties, it is the correct search doc)
>>>
>>>    - version counter: *13*
>>>    - last update on: 2016-*03*-15
>>>    - status: '*completed*'
>>>
>>>
>>> In P1 I had expected, that the entity has the same data than its search 
>>> doc, but in fact was stale.
>>>
>>> In P2, I have expected for both, entity and search doc:
>>>
>>>    - version counter: *14*
>>>    - last update on: 2016-*04-10*
>>>    - status: '*completed*'
>>>
>>> because of the migration script that updated one property for all 
>>> entities in this kind, and should also have triggered an update of the 
>>> search doc.
>>>
>>> There are two observations:
>>>
>>>    1. *P1's entity already had stale data*, older than the search doc. 
>>>    This could be explained with an inconsistent / failed write to the 
>>>    datastore, at least in theory. The app uses transactions for 
>>>    reading/writing of this kind. In _post_put_hook(), *if 
>>>    future.check_success() is None*, the search doc is written/updated. 
>>>    I can think of exotic situations where the search doc could be older 
>>> than 
>>>    its original entity in datastore, but since the datastore write happens 
>>> in 
>>>    a transaction, and the search export happens only with a successful 
>>> write 
>>>    operation, I fail to explain how the entity in datastore could prevail 
>>> the 
>>>    change (or revert to an older version). We talk about 5 different types 
>>> of 
>>>    changes during one month that have all been lost. There are also no 
>>>    deferred tasks that write potentially old entities back into the 
>>> datastore.
>>>    2. *P2's entity again shows stale data,* older than the search doc. 
>>>    This is particularly confusing, because the search doc is only written 
>>> with 
>>>    the data read from datastore. And since the search docs were not copied 
>>>    from P1, the only source was the data freshly restored from the P1 
>>> backup. 
>>>    Although, if I look into the P1 datastore, as shown above, the data is 
>>>    already stale. Where did P2's datastore then get the new data from? So 
>>>    while the batch handler was running, the datastore had the data of 
>>> version 
>>>    counter 13, but at some point after writing the search doc, the 
>>> datastore 
>>>    reverted the entity to version counter 8. However, all the datastore 
>>> writes 
>>>    for this entity have happened long time ago in the original datastore of 
>>>    P1. So, it looks to me, that the datastore in P2 somehow got both data 
>>> for 
>>>    this entity, version counter 8 *and* version counter 13. Wouldn't 
>>>    this imply that the backup data could contain multiple versions of the 
>>> same 
>>>    entity, or could there be another leak that works across projects? And 
>>> for 
>>>    some reason, after the version counter 13 data was written to search 
>>> docs, 
>>>    the entity got reverted to version counter 8. 
>>>
>>> I'm running out of possible explanations for this, other than Datastore 
>>> is able to have multiple versions of the same entity and those are even 
>>> part of a backup.
>>>
>>> Paint me confused :) However, maybe you have any idea what could cause 
>>> this.
>>>
>>> Ani
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/d8da3617-4516-438f-b3d3-962277413638%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to