[google-appengine] Re: Datastore: Problem updating entities

2017-08-17 Thread 'Shivam(Google Cloud Support)' via Google App Engine


You can view the Python example 

 
which is importing these datastore libraries 
.
 
For further issues with the code, you may directly contact 
 them.


On Wednesday, August 16, 2017 at 5:11:37 AM UTC-4, Filipe Caldas wrote:
>
> Hi Shivam,
>
>   Is it possible to use Python instead of Java on Dataflow to do the 
> update on Datastore? If so where can I find an example?
>
> Best regards,
>
> On Tuesday, 15 August 2017 18:04:58 UTC+1, Shivam(Google Cloud Support) 
> wrote:
>>
>> The job could tend to be slow for such amount of entities following the 
>> example here 
>> .
>>  
>> The solution for proper Datastore Map Reduce solution in the cloud would be 
>> Datastore I/O using Dataflow 
>> .
>>
>>
>> Dataflow SDKs provide an API for reading data from and writing data to a 
>> Google Cloud Datastore database. Its programming model is designed to 
>> simplify the mechanics of large-scale data processing. When you program 
>> with a Dataflow SDK, you are essentially creating a data processing job to 
>> be executed by one of the Cloud Dataflow runner services. This model lets 
>> you concentrate on the logical composition of your data processing job, 
>> rather than the physical orchestration of parallel processing. You can 
>> focus on what you need your job to do instead of exactly how that job gets 
>> executed.
>>
>> If you choose to stick with Map Reduce on App Engine, it is recommended 
>> to file any issues you experience directly to the engineering team on their 
>> Git repository 
>> .
>>
>> On Tuesday, August 15, 2017 at 6:00:37 AM UTC-4, Filipe Caldas wrote:
>>>
>>> The job was actually doing slightly more than just setting a property to 
>>> a default value, it also was doing a .strip() in one of the fields due to 
>>> an error in our insert scripts, so in some cases there is a need to do a 
>>> mass update on all entities, it definitely doesn't happen often but we 
>>> would rather not re-insert all the entities on the table.
>>>
>>> The documented method of updating entities works fine, but as many other 
>>> users have noticed for any case where the amount of rows is big (> 10M) 
>>> this would take over a week to finish, it is also definitely much cheaper 
>>> to run that MapReduce, but it takes too long.
>>>
>>> The way we found to do it in a "safe" way, as in we will be sure the 
>>> task will be done and in a limited amount of time was to instead use a VM 
>>> that spawns about 5 threads and reads / updates in parallel the entities on 
>>> Datastore (and even this is still taking about 2 days to finish for 12M 
>>> entities).
>>>
>>> On Friday, 11 August 2017 21:46:20 UTC+1, Shivam(Google Cloud Support) 
>>> wrote:

 There should be no actual need to mass-put a new property to all of 
 your entities, and set that new property to a default value since the 
 Datastore supports entities with and without set property values (as you 
 have noticed with the failed Map Reduce job). 

 You can assume that if an entity does not have the property, that it is 
 equal to the default "indexed=0". You can then set this value directly in 
 your application during read time. If it exists, read it and use it, else 
 use a hard-coded default and set the value then in  your code (aka only 
 when the entity is being read).

 Updating existing entities is documented here 
 .
  



 Without knowing what happened exactly, it is not possible to know the 
 reason for 70M reads. However, I would recommend to view this post 
  which might answer your 
 question.


 On Friday, August 11, 2017 at 9:02:53 AM UTC-4, Filipe Caldas wrote:
>
> Hi,
>
>   I am currently trying to update a kind in my database and add a 
> field (indexed=0), the table has more than 10M entities.
>
>   I tried to use MapReduce for appengine and launched a fairly simple 
> job where the mapper only sets the property and yields an 
> operation.db.Put(), the only problem is that some of the shards failed, 
> so 
> the job was stopped and automatically restarted.
>
>   Problem is, launching this job on 10M entities cost me about $ 100 
> and the job was not finished (the retry was going slow so don't think 
> they 
> billed much for that). 
>   
> The extra annoying thing is that there is no other way that I know to 
> 

[google-appengine] Re: Datastore: Problem updating entities

2017-08-16 Thread Filipe Caldas
Hi Shivam,

  Is it possible to use Python instead of Java on Dataflow to do the update 
on Datastore? If so where can I find an example?

Best regards,

On Tuesday, 15 August 2017 18:04:58 UTC+1, Shivam(Google Cloud Support) 
wrote:
>
> The job could tend to be slow for such amount of entities following the 
> example here 
> .
>  
> The solution for proper Datastore Map Reduce solution in the cloud would be 
> Datastore I/O using Dataflow 
> .
>
>
> Dataflow SDKs provide an API for reading data from and writing data to a 
> Google Cloud Datastore database. Its programming model is designed to 
> simplify the mechanics of large-scale data processing. When you program 
> with a Dataflow SDK, you are essentially creating a data processing job to 
> be executed by one of the Cloud Dataflow runner services. This model lets 
> you concentrate on the logical composition of your data processing job, 
> rather than the physical orchestration of parallel processing. You can 
> focus on what you need your job to do instead of exactly how that job gets 
> executed.
>
> If you choose to stick with Map Reduce on App Engine, it is recommended to 
> file any issues you experience directly to the engineering team on their 
> Git repository 
> .
>
> On Tuesday, August 15, 2017 at 6:00:37 AM UTC-4, Filipe Caldas wrote:
>>
>> The job was actually doing slightly more than just setting a property to 
>> a default value, it also was doing a .strip() in one of the fields due to 
>> an error in our insert scripts, so in some cases there is a need to do a 
>> mass update on all entities, it definitely doesn't happen often but we 
>> would rather not re-insert all the entities on the table.
>>
>> The documented method of updating entities works fine, but as many other 
>> users have noticed for any case where the amount of rows is big (> 10M) 
>> this would take over a week to finish, it is also definitely much cheaper 
>> to run that MapReduce, but it takes too long.
>>
>> The way we found to do it in a "safe" way, as in we will be sure the task 
>> will be done and in a limited amount of time was to instead use a VM that 
>> spawns about 5 threads and reads / updates in parallel the entities on 
>> Datastore (and even this is still taking about 2 days to finish for 12M 
>> entities).
>>
>> On Friday, 11 August 2017 21:46:20 UTC+1, Shivam(Google Cloud Support) 
>> wrote:
>>>
>>> There should be no actual need to mass-put a new property to all of your 
>>> entities, and set that new property to a default value since the Datastore 
>>> supports entities with and without set property values (as you have noticed 
>>> with the failed Map Reduce job). 
>>>
>>> You can assume that if an entity does not have the property, that it is 
>>> equal to the default "indexed=0". You can then set this value directly in 
>>> your application during read time. If it exists, read it and use it, else 
>>> use a hard-coded default and set the value then in  your code (aka only 
>>> when the entity is being read).
>>>
>>> Updating existing entities is documented here 
>>> .
>>>  
>>>
>>>
>>>
>>> Without knowing what happened exactly, it is not possible to know the 
>>> reason for 70M reads. However, I would recommend to view this post 
>>>  which might answer your question.
>>>
>>>
>>> On Friday, August 11, 2017 at 9:02:53 AM UTC-4, Filipe Caldas wrote:

 Hi,

   I am currently trying to update a kind in my database and add a field 
 (indexed=0), the table has more than 10M entities.

   I tried to use MapReduce for appengine and launched a fairly simple 
 job where the mapper only sets the property and yields an 
 operation.db.Put(), the only problem is that some of the shards failed, so 
 the job was stopped and automatically restarted.

   Problem is, launching this job on 10M entities cost me about $ 100 
 and the job was not finished (the retry was going slow so don't think they 
 billed much for that). 
   
 The extra annoying thing is that there is no other way that I know to 
 update these properties "fast" enough (the mapreduce took over 7 hours to 
 fail on 10M). I know Beam/Dataflow is apparently the way to go, but 
 documentation on doing basic operations like updating Datastore entities 
 is 
 still very poor (not sure if can even be done).

   So, my question is is there a fast and *safe* way to update  my 
 entities that does not consist of doing 10M fetchs and puts in sequence?

   Bonus question: do anyone know why was I billed 70M reads on only 10M 
 entities?

 Best regards,

>>>

-- 
You received 

[google-appengine] Re: Datastore: Problem updating entities

2017-08-15 Thread 'Shivam(Google Cloud Support)' via Google App Engine
The job could tend to be slow for such amount of entities following the 
example here 
.
 
The solution for proper Datastore Map Reduce solution in the cloud would be 
Datastore I/O using Dataflow 
.


Dataflow SDKs provide an API for reading data from and writing data to a 
Google Cloud Datastore database. Its programming model is designed to 
simplify the mechanics of large-scale data processing. When you program 
with a Dataflow SDK, you are essentially creating a data processing job to 
be executed by one of the Cloud Dataflow runner services. This model lets 
you concentrate on the logical composition of your data processing job, 
rather than the physical orchestration of parallel processing. You can 
focus on what you need your job to do instead of exactly how that job gets 
executed.

If you choose to stick with Map Reduce on App Engine, it is recommended to 
file any issues you experience directly to the engineering team on their 
Git repository .

On Tuesday, August 15, 2017 at 6:00:37 AM UTC-4, Filipe Caldas wrote:
>
> The job was actually doing slightly more than just setting a property to a 
> default value, it also was doing a .strip() in one of the fields due to an 
> error in our insert scripts, so in some cases there is a need to do a mass 
> update on all entities, it definitely doesn't happen often but we would 
> rather not re-insert all the entities on the table.
>
> The documented method of updating entities works fine, but as many other 
> users have noticed for any case where the amount of rows is big (> 10M) 
> this would take over a week to finish, it is also definitely much cheaper 
> to run that MapReduce, but it takes too long.
>
> The way we found to do it in a "safe" way, as in we will be sure the task 
> will be done and in a limited amount of time was to instead use a VM that 
> spawns about 5 threads and reads / updates in parallel the entities on 
> Datastore (and even this is still taking about 2 days to finish for 12M 
> entities).
>
> On Friday, 11 August 2017 21:46:20 UTC+1, Shivam(Google Cloud Support) 
> wrote:
>>
>> There should be no actual need to mass-put a new property to all of your 
>> entities, and set that new property to a default value since the Datastore 
>> supports entities with and without set property values (as you have noticed 
>> with the failed Map Reduce job). 
>>
>> You can assume that if an entity does not have the property, that it is 
>> equal to the default "indexed=0". You can then set this value directly in 
>> your application during read time. If it exists, read it and use it, else 
>> use a hard-coded default and set the value then in  your code (aka only 
>> when the entity is being read).
>>
>> Updating existing entities is documented here 
>> .
>>  
>>
>>
>>
>> Without knowing what happened exactly, it is not possible to know the 
>> reason for 70M reads. However, I would recommend to view this post 
>>  which might answer your question.
>>
>>
>> On Friday, August 11, 2017 at 9:02:53 AM UTC-4, Filipe Caldas wrote:
>>>
>>> Hi,
>>>
>>>   I am currently trying to update a kind in my database and add a field 
>>> (indexed=0), the table has more than 10M entities.
>>>
>>>   I tried to use MapReduce for appengine and launched a fairly simple 
>>> job where the mapper only sets the property and yields an 
>>> operation.db.Put(), the only problem is that some of the shards failed, so 
>>> the job was stopped and automatically restarted.
>>>
>>>   Problem is, launching this job on 10M entities cost me about $ 100 and 
>>> the job was not finished (the retry was going slow so don't think they 
>>> billed much for that). 
>>>   
>>> The extra annoying thing is that there is no other way that I know to 
>>> update these properties "fast" enough (the mapreduce took over 7 hours to 
>>> fail on 10M). I know Beam/Dataflow is apparently the way to go, but 
>>> documentation on doing basic operations like updating Datastore entities is 
>>> still very poor (not sure if can even be done).
>>>
>>>   So, my question is is there a fast and *safe* way to update  my 
>>> entities that does not consist of doing 10M fetchs and puts in sequence?
>>>
>>>   Bonus question: do anyone know why was I billed 70M reads on only 10M 
>>> entities?
>>>
>>> Best regards,
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.

[google-appengine] Re: Datastore: Problem updating entities

2017-08-15 Thread Filipe Caldas
The job was actually doing slightly more than just setting a property to a 
default value, it also was doing a .strip() in one of the fields due to an 
error in our insert scripts, so in some cases there is a need to do a mass 
update on all entities, it definitely doesn't happen often but we would 
rather not re-insert all the entities on the table.

The documented method of updating entities works fine, but as many other 
users have noticed for any case where the amount of rows is big (> 10M) 
this would take over a week to finish, it is also definitely much cheaper 
to run that MapReduce, but it takes too long.

The way we found to do it in a "safe" way, as in we will be sure the task 
will be done and in a limited amount of time was to instead use a VM that 
spawns about 5 threads and reads / updates in parallel the entities on 
Datastore (and even this is still taking about 2 days to finish for 12M 
entities).

On Friday, 11 August 2017 21:46:20 UTC+1, Shivam(Google Cloud Support) 
wrote:
>
> There should be no actual need to mass-put a new property to all of your 
> entities, and set that new property to a default value since the Datastore 
> supports entities with and without set property values (as you have noticed 
> with the failed Map Reduce job). 
>
> You can assume that if an entity does not have the property, that it is 
> equal to the default "indexed=0". You can then set this value directly in 
> your application during read time. If it exists, read it and use it, else 
> use a hard-coded default and set the value then in  your code (aka only 
> when the entity is being read).
>
> Updating existing entities is documented here 
> .
>  
>
>
>
> Without knowing what happened exactly, it is not possible to know the 
> reason for 70M reads. However, I would recommend to view this post 
>  which might answer your question.
>
>
> On Friday, August 11, 2017 at 9:02:53 AM UTC-4, Filipe Caldas wrote:
>>
>> Hi,
>>
>>   I am currently trying to update a kind in my database and add a field 
>> (indexed=0), the table has more than 10M entities.
>>
>>   I tried to use MapReduce for appengine and launched a fairly simple job 
>> where the mapper only sets the property and yields an operation.db.Put(), 
>> the only problem is that some of the shards failed, so the job was stopped 
>> and automatically restarted.
>>
>>   Problem is, launching this job on 10M entities cost me about $ 100 and 
>> the job was not finished (the retry was going slow so don't think they 
>> billed much for that). 
>>   
>> The extra annoying thing is that there is no other way that I know to 
>> update these properties "fast" enough (the mapreduce took over 7 hours to 
>> fail on 10M). I know Beam/Dataflow is apparently the way to go, but 
>> documentation on doing basic operations like updating Datastore entities is 
>> still very poor (not sure if can even be done).
>>
>>   So, my question is is there a fast and *safe* way to update  my 
>> entities that does not consist of doing 10M fetchs and puts in sequence?
>>
>>   Bonus question: do anyone know why was I billed 70M reads on only 10M 
>> entities?
>>
>> Best regards,
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/ce10cb95-39cd-4ce8-b954-7f2a620a751b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[google-appengine] Re: Datastore: Problem updating entities

2017-08-11 Thread 'Shivam(Google Cloud Support)' via Google App Engine


There should be no actual need to mass-put a new property to all of your 
entities, and set that new property to a default value since the Datastore 
supports entities with and without set property values (as you have noticed 
with the failed Map Reduce job). 

You can assume that if an entity does not have the property, that it is 
equal to the default "indexed=0". You can then set this value directly in 
your application during read time. If it exists, read it and use it, else 
use a hard-coded default and set the value then in  your code (aka only 
when the entity is being read).

Updating existing entities is documented here 
.
 



Without knowing what happened exactly, it is not possible to know the 
reason for 70M reads. However, I would recommend to view this post 
 which might answer your question.


On Friday, August 11, 2017 at 9:02:53 AM UTC-4, Filipe Caldas wrote:
>
> Hi,
>
>   I am currently trying to update a kind in my database and add a field 
> (indexed=0), the table has more than 10M entities.
>
>   I tried to use MapReduce for appengine and launched a fairly simple job 
> where the mapper only sets the property and yields an operation.db.Put(), 
> the only problem is that some of the shards failed, so the job was stopped 
> and automatically restarted.
>
>   Problem is, launching this job on 10M entities cost me about $ 100 and 
> the job was not finished (the retry was going slow so don't think they 
> billed much for that). 
>   
> The extra annoying thing is that there is no other way that I know to 
> update these properties "fast" enough (the mapreduce took over 7 hours to 
> fail on 10M). I know Beam/Dataflow is apparently the way to go, but 
> documentation on doing basic operations like updating Datastore entities is 
> still very poor (not sure if can even be done).
>
>   So, my question is is there a fast and *safe* way to update  my entities 
> that does not consist of doing 10M fetchs and puts in sequence?
>
>   Bonus question: do anyone know why was I billed 70M reads on only 10M 
> entities?
>
> Best regards,
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/363ad00e-0345-46b1-b55a-dedb6d36c573%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[google-appengine] Re: datastore problem

2009-10-21 Thread Tonny

You need tread gently with updates to your models.

When you are adding new properties, provide a default value.

Cheers
Tonny

On 20 Okt., 06:28, Tim pencil...@gmail.com wrote:
 first i create an entity of note(includes some fields), then i
 deploy it to appengine, and add some record to it.

 but,  then i change the structure of this entity note(add a field of
 isVisible), when i deploy my app again, you know the record before
 their value of isVisible is null, and then my problem come out.

 how can I clear my datastore? when i wanna view my datastore , it
 can't work.

 please help, thanks in advance.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Datastore problem?

2009-08-06 Thread Hrishikesh Bakshi
I am having the same issue intermittently from the last hour. Not on all
tables though.

On Thu, Aug 6, 2009 at 12:06 PM, Jesse Grosjean je...@hogbaysoftware.comwrote:


 I'm seeing lots of Datastore timeouts right now that I don't ushually
 see on www.writeroom.ws. Is there a known issue with the datastore
 now?

 Jesse
 



-- 
Hrishikesh Bakshi

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Datastore problem?

2009-08-06 Thread Jesse Grosjean

The problem seems to be fixed now, for me anyway.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---