[google-appengine] Re: Best way to update 400,000 entities at once?

2014-03-05 Thread Wolfram Gürlich
Instead of iterating over all keys at once you could also use the hidden  
__scatter__ property to determine proper split points for your key range in 
advance. That is what the mapreduce library does.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] How to make endpoints working with endpoints old_dev_appserver.py and Dart?

2014-03-05 Thread Cezary Wagner
I do not know what need threading Dart or Endpoints. I receive simple error 
"connection refused" with old_dev_appserver.py which allow debugging - that 
looks like typical problem with multiple connection - GAE refuse second 
connection.

API Explorer doesn't work if old_dev_appserver.py which allow debugging == 
 mean :) == I see random list of Google APIs instead of requested MY API.

API Explorer shows MY API with dev_appserver.py.

For both I use same url.

IT NOT WORKS.

W dniu poniedziałek, 3 marca 2014 19:41:41 UTC+1 użytkownik Vinny P napisał:
>
> On Mon, Mar 3, 2014 at 7:02 AM, Cezary Wagner 
> 
> > wrote:
>
>> http://stackoverflow.com/questions/22118196 
>>
>
>
> *From your SO post:* *I try to use old_dev_appserver.py - I can debug 
> Python whatever Dart can not connect to GAE application (looks like need 
> threading - no idea why?). API expolorer not work.*
>
> Which needs threading, the Python dev appserver or Dart? Do you have an 
> error message/traceback for the connection error?
>
> When you say the APIs Explorer doesn't work, how exactly does it not work? 
> What are you expecting from it, and what is the error message (if any) 
> coming from it? 
>   
> -
> -Vinny P
> Technology & Media Advisor
> Chicago, IL
>
> App Engine Code Samples: http://www.learntogoogleit.com
>   
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Python27 + new dev_appservery.py + debugging

2014-03-05 Thread Cezary Wagner
Let's finish this unpractical talk - your arguments are hell quality.

With Eclipse I can see whole obejct with one click and hover - forward with 
one key.

With PDB -> http://docs.python.org/2/library/pdb.html
1. I need put not need code to code -> import pdb; pdb.set_trace()
2. Get variable value _> print x
3.1. Get next variable value -> print x.
3. Remember to remove -> import pdb; pdb.set_trace()
4. after each - > import pdb; pdb.set_trace() I need continue (F8! in 
Eclipse)

You proposing hell like a devil :)



W dniu poniedziałek, 3 marca 2014 22:53:06 UTC+1 użytkownik D X napisał:
>
> From my experience, stepping through code is much slower on eclipse than 
> pdb.
>
> I disagree with your other comments; pdb is fine for professional use on 
> huge projects.
>
>
>
>
>
> On Sunday, March 2, 2014 3:57:22 PM UTC-8, Cezary Wagner wrote:
>>
>> I found some combo like Pydev 3.3.3 + Eclipse 3.8.x (but no Django 
>> templates support or Aptana!).
>>
>> It allow fast debugging with little slower coding.
>>
>> W dniu poniedziałek, 3 marca 2014 00:49:14 UTC+1 użytkownik Cezary Wagner 
>> napisał:
>>>
>>> Thanks for hints - it is not helpful.
>>>
>>> Discussion is very old and there is not clear conclusion
>>>
>>> In my opinion pdb is good for hobbists - placing break points by editing 
>>> code it is good for 100 lines programs or amateurs - sadomacho :)
>>>
>>> It looks that I can not use dart, endpoints because there is no usable 
>>> debugger for it :)
>>>
>>> W dniu niedziela, 2 marca 2014 17:50:44 UTC+1 użytkownik Vinny P napisał:

 On Sat, Mar 1, 2014 at 9:59 AM, Cezary Wagner 
  wrote:

> What is the best tools to use dev_appserver.py with breakpoint and 
> debugging - what is the best choice.
>



 Hi Cezary,

 You might want to read this forums discussion here: 
 https://groups.google.com/forum/#!topic/google-appengine/ep5BWYKpQpU - 
 there's an interesting discussion on Python debugging with the 
 dev_appserver.

 If you're OK with debugging using pdb, you'll want to read this message 
 in particular: 
 https://groups.google.com/d/msg/google-appengine/ep5BWYKpQpU/41asdxKhuycJ
   
  
 -
 -Vinny P
 Technology & Media Advisor
 Chicago, IL

 App Engine Code Samples: http://www.learntogoogleit.com
  
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Python27 + new dev_appservery.py + debugging

2014-03-05 Thread Cezary Wagner
Pdb is hell - you waste time and money.

In the most case simple debugger is enough and some assert use.

W dniu wtorek, 4 marca 2014 04:01:57 UTC+1 użytkownik timh napisał:
>
> +1  I have worked on numersous projects with very large code bases on 
> appengine, and I prefer pdb, and don't use Eclipse.
>
> On Tuesday, March 4, 2014 5:53:06 AM UTC+8, D X wrote:
>>
>> From my experience, stepping through code is much slower on eclipse than 
>> pdb.
>>
>> I disagree with your other comments; pdb is fine for professional use on 
>> huge projects.
>>
>>
>>
>>
>>
>> On Sunday, March 2, 2014 3:57:22 PM UTC-8, Cezary Wagner wrote:
>>>
>>> I found some combo like Pydev 3.3.3 + Eclipse 3.8.x (but no Django 
>>> templates support or Aptana!).
>>>
>>> It allow fast debugging with little slower coding.
>>>
>>> W dniu poniedziałek, 3 marca 2014 00:49:14 UTC+1 użytkownik Cezary 
>>> Wagner napisał:

 Thanks for hints - it is not helpful.

 Discussion is very old and there is not clear conclusion

 In my opinion pdb is good for hobbists - placing break points by 
 editing code it is good for 100 lines programs or amateurs - sadomacho :)

 It looks that I can not use dart, endpoints because there is no usable 
 debugger for it :)

 W dniu niedziela, 2 marca 2014 17:50:44 UTC+1 użytkownik Vinny P 
 napisał:
>
> On Sat, Mar 1, 2014 at 9:59 AM, Cezary Wagner 
>  wrote:
>
>> What is the best tools to use dev_appserver.py with breakpoint and 
>> debugging - what is the best choice.
>>
>
>
>
> Hi Cezary,
>
> You might want to read this forums discussion here: 
> https://groups.google.com/forum/#!topic/google-appengine/ep5BWYKpQpU- 
> there's an interesting discussion on Python debugging with the 
> dev_appserver.
>
> If you're OK with debugging using pdb, you'll want to read this 
> message in particular: 
> https://groups.google.com/d/msg/google-appengine/ep5BWYKpQpU/41asdxKhuycJ
>   
>  
> -
> -Vinny P
> Technology & Media Advisor
> Chicago, IL
>
> App Engine Code Samples: http://www.learntogoogleit.com
>  


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Python27 + new dev_appservery.py + debugging

2014-03-05 Thread Cezary Wagner
Thanks for helpful hint I try to check it.

W dniu wtorek, 4 marca 2014 16:59:06 UTC+1 użytkownik Doug Anderson napisał:
>
> I've been using PyCharm and am very pleased with it.  I've also used 
> JetBrains' WebStorm with node.js projects and was quite pleased with that 
> as well.
>
> On Sunday, March 2, 2014 6:49:14 PM UTC-5, Cezary Wagner wrote:
>>
>> Thanks for hints - it is not helpful.
>>
>> Discussion is very old and there is not clear conclusion
>>
>> In my opinion pdb is good for hobbists - placing break points by editing 
>> code it is good for 100 lines programs or amateurs - sadomacho :)
>>
>> It looks that I can not use dart, endpoints because there is no usable 
>> debugger for it :)
>>
>> W dniu niedziela, 2 marca 2014 17:50:44 UTC+1 użytkownik Vinny P napisał:
>>>
>>> On Sat, Mar 1, 2014 at 9:59 AM, Cezary Wagner 
>>>  wrote:
>>>
 What is the best tools to use dev_appserver.py with breakpoint and 
 debugging - what is the best choice.

>>>
>>>
>>>
>>> Hi Cezary,
>>>
>>> You might want to read this forums discussion here: 
>>> https://groups.google.com/forum/#!topic/google-appengine/ep5BWYKpQpU - 
>>> there's an interesting discussion on Python debugging with the 
>>> dev_appserver.
>>>
>>> If you're OK with debugging using pdb, you'll want to read this message 
>>> in particular: 
>>> https://groups.google.com/d/msg/google-appengine/ep5BWYKpQpU/41asdxKhuycJ
>>>   
>>>  
>>> -
>>> -Vinny P
>>> Technology & Media Advisor
>>> Chicago, IL
>>>
>>> App Engine Code Samples: http://www.learntogoogleit.com
>>>  
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Python27 + new dev_appservery.py + debugging

2014-03-05 Thread Cezary Wagner
Debugging is for troubleshooting.

You should never avoid unit testing, logging, assert, exception if you have 
debugger - since debugger is not testing tools. I am not avoiding too bu 
need fast debugger - mean spend less time on this.

W dniu wtorek, 4 marca 2014 17:16:35 UTC+1 użytkownik PK napisał:
>
> What version of pycharm do you use? My experience is that development 
> slows down enough to avoid it. If there were an option to attach to a 
> running instance when I really want to track something down I might use it 
> once in a while. I am still looking for a solution that does not slow down 
> the Dev. Cycle; I would love a "how to" blog if somebody mastered that. 
>
> On the upside, not using a debugger has forced me to add many debug log 
> messages that prove useful in troubleshooting in production. 
>
> PK
> www.gae123.com
>
> On Mar 4, 2014, at 7:59 AM, Doug Anderson > 
> wrote:
>
> I've been using PyCharm and am very pleased with it.  I've also used 
> JetBrains' WebStorm with node.js projects and was quite pleased with that 
> as well.
>
> On Sunday, March 2, 2014 6:49:14 PM UTC-5, Cezary Wagner wrote:
>>
>> Thanks for hints - it is not helpful.
>>
>> Discussion is very old and there is not clear conclusion
>>
>> In my opinion pdb is good for hobbists - placing break points by editing 
>> code it is good for 100 lines programs or amateurs - sadomacho :)
>>
>> It looks that I can not use dart, endpoints because there is no usable 
>> debugger for it :)
>>
>> W dniu niedziela, 2 marca 2014 17:50:44 UTC+1 użytkownik Vinny P napisał:
>>>
>>> On Sat, Mar 1, 2014 at 9:59 AM, Cezary Wagner 
>>>  wrote:
>>>
 What is the best tools to use dev_appserver.py with breakpoint and 
 debugging - what is the best choice.

>>>
>>>
>>>
>>> Hi Cezary,
>>>
>>> You might want to read this forums discussion here: 
>>> https://groups.google.com/forum/#!topic/google-appengine/ep5BWYKpQpU - 
>>> there's an interesting discussion on Python debugging with the 
>>> dev_appserver.
>>>
>>> If you're OK with debugging using pdb, you'll want to read this message 
>>> in particular: 
>>> https://groups.google.com/d/msg/google-appengine/ep5BWYKpQpU/41asdxKhuycJ
>>>   
>>>  
>>> -
>>> -Vinny P
>>> Technology & Media Advisor
>>> Chicago, IL
>>>
>>> App Engine Code Samples: http://www.learntogoogleit.com
>>>  
>>  -- 
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to google-appengi...@googlegroups.com .
> To post to this group, send email to google-a...@googlegroups.com
> .
> Visit this group at http://groups.google.com/group/google-appengine.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Python27 + new dev_appservery.py + debugging

2014-03-05 Thread Cezary Wagner
I completly disagree that pdb is the best solution for any size project. I 
was explained in my first response and will not change mind since tested it.

Pdb can be only the best for really hardcore debugging - I think that the 
most developers will never meet. It is waste of time and money to use it 
for simple debugging.

Tell me can I debug in pdb with one click or forward with one key?

W dniu środa, 5 marca 2014 01:21:53 UTC+1 użytkownik Vinny P napisał:
>
> On Sun, Mar 2, 2014 at 5:49 PM, Cezary Wagner 
> 
> > wrote:
>
>> In my opinion pdb is good for hobbists
>>
>
>
> PDB is perfectly fine for use with any size projects. 
>
> The choice of a debugger is - of course - largely a personal preference, 
> but I'd urge you to give pdb a second chance.
>   
>  
> -
> -Vinny P
> Technology & Media Advisor
> Chicago, IL
>
> App Engine Code Samples: http://www.learntogoogleit.com
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Python27 + new dev_appservery.py + debugging

2014-03-05 Thread Cezary Wagner
Why you prefer pdb - can you debug basic problems with one/two clicks + 
forward in pdb?

W dniu wtorek, 4 marca 2014 04:01:57 UTC+1 użytkownik timh napisał:
>
> +1  I have worked on numersous projects with very large code bases on 
> appengine, and I prefer pdb, and don't use Eclipse.
>
> On Tuesday, March 4, 2014 5:53:06 AM UTC+8, D X wrote:
>>
>> From my experience, stepping through code is much slower on eclipse than 
>> pdb.
>>
>> I disagree with your other comments; pdb is fine for professional use on 
>> huge projects.
>>
>>
>>
>>
>>
>> On Sunday, March 2, 2014 3:57:22 PM UTC-8, Cezary Wagner wrote:
>>>
>>> I found some combo like Pydev 3.3.3 + Eclipse 3.8.x (but no Django 
>>> templates support or Aptana!).
>>>
>>> It allow fast debugging with little slower coding.
>>>
>>> W dniu poniedziałek, 3 marca 2014 00:49:14 UTC+1 użytkownik Cezary 
>>> Wagner napisał:

 Thanks for hints - it is not helpful.

 Discussion is very old and there is not clear conclusion

 In my opinion pdb is good for hobbists - placing break points by 
 editing code it is good for 100 lines programs or amateurs - sadomacho :)

 It looks that I can not use dart, endpoints because there is no usable 
 debugger for it :)

 W dniu niedziela, 2 marca 2014 17:50:44 UTC+1 użytkownik Vinny P 
 napisał:
>
> On Sat, Mar 1, 2014 at 9:59 AM, Cezary Wagner 
>  wrote:
>
>> What is the best tools to use dev_appserver.py with breakpoint and 
>> debugging - what is the best choice.
>>
>
>
>
> Hi Cezary,
>
> You might want to read this forums discussion here: 
> https://groups.google.com/forum/#!topic/google-appengine/ep5BWYKpQpU- 
> there's an interesting discussion on Python debugging with the 
> dev_appserver.
>
> If you're OK with debugging using pdb, you'll want to read this 
> message in particular: 
> https://groups.google.com/d/msg/google-appengine/ep5BWYKpQpU/41asdxKhuycJ
>   
>  
> -
> -Vinny P
> Technology & Media Advisor
> Chicago, IL
>
> App Engine Code Samples: http://www.learntogoogleit.com
>  


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Re: Best way to update 400,000 entities at once?

2014-03-05 Thread Lorenzo Bugiani
I haven't understand what you can do with __scatter__ property.
As MapReduce docs says, "*We do not allow retrieving this value directly
(it's stripped from the entity before it's returned to the application by
the Datastore)*"...

Also, I haven't understood what's wrong with using tasks and cursors.
A task can simply iterate the data, fetching N entities at time (in this
case 1000), than launch a task that can update them (by itself or splitting
again the work). At most, if 10 minutes aren't so much to iterate over all
data, this "main" task can fetch only a piece of data, start update task,
than start itself on the next chunk of data...

Obviously if is possible to split the work using only key's informations
(for example, keys are number from 1 to 400.000) this works better, because
fetching data by key is always a better than querying for the same data...


2014-03-05 13:18 GMT+01:00 Wolfram Gürlich :

> Instead of iterating over all keys at once you could also use the hidden
> __scatter__ property to determine proper split points for your key range
> in advance. That is what the mapreduce library does.
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Re: Best way to update 400,000 entities at once?

2014-03-05 Thread Barry Hunter
On 5 March 2014 14:24, Lorenzo Bugiani  wrote:

> I haven't understand what you can do with __scatter__ property.
> As MapReduce docs says, "*We do not allow retrieving this value directly
> (it's stripped from the entity before it's returned to the application by
> the Datastore)*"...
>

Suppose you want to break the whole dataset into N number of shards.

In theory, if you just take the first N keys from a keys only query *sorted
*by __scatter__, you get N keys evenly spread out thoughtout the whole
dataset.

Each of those keys can then be used as 'first' key in a standard datastore
query, to get all documents in that shard. (using a greater than __key__
filter, with the results in __key__ order). Works sot of similar to how a
cursor actully works under the hood.



>
> Also, I haven't understood what's wrong with using tasks and cursors.
>

The main 'advantage' of using the __scatter__ is, can get all the data to
setup ALL the tasks at once. Say you want 200 shards. One query retrieving
200 keys, and you can immidiately create all 200 tasks*. You can then begin
processing those tasks in any order, even concurrently. (althouh if you
have lots of shards to create, may use cursors for 'looping' that initial
query!)

Using a 'while loop' and cursors. You have to run each query, and get all
the data (keys only or the actual documents), to get the cursor to begin
the next task. Even if you do keys only queries, to get all the cursors
(and add the cursors to the task, so they can get all the documents for
real), you downloading much more data than you need.


So with the 'nibble away' approach, you either have to be very inefficient
in creating all the initial tasks (so can get a progress report), or just
have to create the 'next task' after running the for-real query (ie get the
next cursor in the query) - in which case cant parallelize.


Both approaches have their pro's and cons. One is much simpler and easier
to understand, the other is more complex, but should be more efficient.




* actully the mapreduce lib, does oversample to improve the results, so its
not quite that efficient.


> A task can simply iterate the data, fetching N entities at time (in this
> case 1000), than launch a task that can update them (by itself or splitting
> again the work). At most, if 10 minutes aren't so much to iterate over all
> data, this "main" task can fetch only a piece of data, start update task,
> than start itself on the next chunk of data...
>

Yes, for small batch runs, its not going to make a big difference, but the
bigger the whole dataset to be iterated, the more any efficiency gains have
an affect.



>
> Obviously if is possible to split the work using only key's informations
> (for example, keys are number from 1 to 400.000) this works better, because
> fetching data by key is always a better than querying for the same data...
>

That's what the __scatter__ does allow :)

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] Re: Python27 + new dev_appservery.py + debugging

2014-03-05 Thread Renzo Nuccitelli
+1 for Pycharme. I dont know why it slow down de developmente, for me it is 
great: Debuging, testing and even git integration is great. Most of time i 
don't need to use another tool for development.

On Saturday, March 1, 2014 12:59:09 PM UTC-3, Cezary Wagner wrote:
>
> What is the best tools to use dev_appserver.py with breakpoint and 
> debugging - what is the best choice.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Re: Best way to update 400,000 entities at once?

2014-03-05 Thread Lorenzo Bugiani
Ok thanks!

I've heard of __scatter__ now for the first time! :D


2014-03-05 16:14 GMT+01:00 Barry Hunter :

>
>
>
> On 5 March 2014 14:24, Lorenzo Bugiani  wrote:
>
>> I haven't understand what you can do with __scatter__ property.
>> As MapReduce docs says, "*We do not allow retrieving this value directly
>> (it's stripped from the entity before it's returned to the application by
>> the Datastore)*"...
>>
>
> Suppose you want to break the whole dataset into N number of shards.
>
> In theory, if you just take the first N keys from a keys only query *sorted
> *by __scatter__, you get N keys evenly spread out thoughtout the whole
> dataset.
>
> Each of those keys can then be used as 'first' key in a standard datastore
> query, to get all documents in that shard. (using a greater than __key__
> filter, with the results in __key__ order). Works sot of similar to how a
> cursor actully works under the hood.
>
>
>
>>
>> Also, I haven't understood what's wrong with using tasks and cursors.
>>
>
> The main 'advantage' of using the __scatter__ is, can get all the data to
> setup ALL the tasks at once. Say you want 200 shards. One query retrieving
> 200 keys, and you can immidiately create all 200 tasks*. You can then begin
> processing those tasks in any order, even concurrently. (althouh if you
> have lots of shards to create, may use cursors for 'looping' that initial
> query!)
>
> Using a 'while loop' and cursors. You have to run each query, and get all
> the data (keys only or the actual documents), to get the cursor to begin
> the next task. Even if you do keys only queries, to get all the cursors
> (and add the cursors to the task, so they can get all the documents for
> real), you downloading much more data than you need.
>
>
> So with the 'nibble away' approach, you either have to be very inefficient
> in creating all the initial tasks (so can get a progress report), or just
> have to create the 'next task' after running the for-real query (ie get the
> next cursor in the query) - in which case cant parallelize.
>
>
> Both approaches have their pro's and cons. One is much simpler and easier
> to understand, the other is more complex, but should be more efficient.
>
>
>
>
> * actully the mapreduce lib, does oversample to improve the results, so
> its not quite that efficient.
>
>
>> A task can simply iterate the data, fetching N entities at time (in this
>> case 1000), than launch a task that can update them (by itself or splitting
>> again the work). At most, if 10 minutes aren't so much to iterate over all
>> data, this "main" task can fetch only a piece of data, start update task,
>> than start itself on the next chunk of data...
>>
>
> Yes, for small batch runs, its not going to make a big difference, but the
> bigger the whole dataset to be iterated, the more any efficiency gains have
> an affect.
>
>
>
>>
>> Obviously if is possible to split the work using only key's informations
>> (for example, keys are number from 1 to 400.000) this works better, because
>> fetching data by key is always a better than querying for the same data...
>>
>
> That's what the __scatter__ does allow :)
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Google App Engine Search API, searching in multiple indexes

2014-03-05 Thread Vinny P
On Wed, Mar 5, 2014 at 1:26 AM,  wrote:

> Thanks for the reply Vinny,
> I could not find the searchAsync() in python sdk , it is present in java.
> could you please share any python documentation for searchAsync().
>


It looks like async searches are not supported in Python just yet. You
might want to star this issue tracking it:
https://code.google.com/p/googleappengine/issues/detail?id=9606


-
-Vinny P
Technology & Media Advisor
Chicago, IL

App Engine Code Samples: http://www.learntogoogleit.com

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Re: Best way to update 400,000 entities at once?

2014-03-05 Thread Jeff Schnitzer
Wow, I had no idea this exists. It's brilliant!

This seems to be the only formal documentation:

https://code.google.com/p/appengine-mapreduce/wiki/ScatterPropertyImplementation

Perhaps this should be added to the main GAE documentation? I never
would have found it.

Jeff

On Wed, Mar 5, 2014 at 7:32 AM, Lorenzo Bugiani  wrote:
> Ok thanks!
>
> I've heard of __scatter__ now for the first time! :D
>
>
> 2014-03-05 16:14 GMT+01:00 Barry Hunter :
>
>>
>>
>>
>> On 5 March 2014 14:24, Lorenzo Bugiani  wrote:
>>>
>>> I haven't understand what you can do with __scatter__ property.
>>> As MapReduce docs says, "We do not allow retrieving this value directly
>>> (it's stripped from the entity before it's returned to the application by
>>> the Datastore)"...
>>
>>
>> Suppose you want to break the whole dataset into N number of shards.
>>
>> In theory, if you just take the first N keys from a keys only query sorted
>> by __scatter__, you get N keys evenly spread out thoughtout the whole
>> dataset.
>>
>> Each of those keys can then be used as 'first' key in a standard datastore
>> query, to get all documents in that shard. (using a greater than __key__
>> filter, with the results in __key__ order). Works sot of similar to how a
>> cursor actully works under the hood.
>>
>>
>>>
>>>
>>> Also, I haven't understood what's wrong with using tasks and cursors.
>>
>>
>> The main 'advantage' of using the __scatter__ is, can get all the data to
>> setup ALL the tasks at once. Say you want 200 shards. One query retrieving
>> 200 keys, and you can immidiately create all 200 tasks*. You can then begin
>> processing those tasks in any order, even concurrently. (althouh if you have
>> lots of shards to create, may use cursors for 'looping' that initial query!)
>>
>> Using a 'while loop' and cursors. You have to run each query, and get all
>> the data (keys only or the actual documents), to get the cursor to begin the
>> next task. Even if you do keys only queries, to get all the cursors (and add
>> the cursors to the task, so they can get all the documents for real), you
>> downloading much more data than you need.
>>
>>
>> So with the 'nibble away' approach, you either have to be very inefficient
>> in creating all the initial tasks (so can get a progress report), or just
>> have to create the 'next task' after running the for-real query (ie get the
>> next cursor in the query) - in which case cant parallelize.
>>
>>
>> Both approaches have their pro's and cons. One is much simpler and easier
>> to understand, the other is more complex, but should be more efficient.
>>
>>
>>
>>
>> * actully the mapreduce lib, does oversample to improve the results, so
>> its not quite that efficient.
>>
>>>
>>> A task can simply iterate the data, fetching N entities at time (in this
>>> case 1000), than launch a task that can update them (by itself or splitting
>>> again the work). At most, if 10 minutes aren't so much to iterate over all
>>> data, this "main" task can fetch only a piece of data, start update task,
>>> than start itself on the next chunk of data...
>>
>>
>> Yes, for small batch runs, its not going to make a big difference, but the
>> bigger the whole dataset to be iterated, the more any efficiency gains have
>> an affect.
>>
>>
>>>
>>>
>>> Obviously if is possible to split the work using only key's informations
>>> (for example, keys are number from 1 to 400.000) this works better, because
>>> fetching data by key is always a better than querying for the same data...
>>
>>
>> That's what the __scatter__ does allow :)
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to google-appengine+unsubscr...@googlegroups.com.
>> To post to this group, send email to google-appengine@googlegroups.com.
>> Visit this group at http://groups.google.com/group/google-appengine.
>> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Re: Best way to update 400,000 entities at once?

2014-03-05 Thread Kaan Soral
3-4 years ago I also shared your enthusiasm, but it's much much more 
logical to just implement a scatter manually, the pre-set probability 
results in a diluted scatter poolset after a while, but the idea of 
scatters is obviously nice

On Wednesday, March 5, 2014 10:27:46 PM UTC+2, Jeff Schnitzer wrote:
>
> Wow, I had no idea this exists. It's brilliant! 
>
> This seems to be the only formal documentation: 
>
>
> https://code.google.com/p/appengine-mapreduce/wiki/ScatterPropertyImplementation
>  
>
> Perhaps this should be added to the main GAE documentation? I never 
> would have found it. 
>
> Jeff 
>
> On Wed, Mar 5, 2014 at 7:32 AM, Lorenzo Bugiani 
> > 
> wrote: 
> > Ok thanks! 
> > 
> > I've heard of __scatter__ now for the first time! :D 
> > 
> > 
> > 2014-03-05 16:14 GMT+01:00 Barry Hunter >: 
>
> > 
> >> 
> >> 
> >> 
> >> On 5 March 2014 14:24, Lorenzo Bugiani > 
> wrote: 
> >>> 
> >>> I haven't understand what you can do with __scatter__ property. 
> >>> As MapReduce docs says, "We do not allow retrieving this value 
> directly 
> >>> (it's stripped from the entity before it's returned to the application 
> by 
> >>> the Datastore)"... 
> >> 
> >> 
> >> Suppose you want to break the whole dataset into N number of shards. 
> >> 
> >> In theory, if you just take the first N keys from a keys only query 
> sorted 
> >> by __scatter__, you get N keys evenly spread out thoughtout the whole 
> >> dataset. 
> >> 
> >> Each of those keys can then be used as 'first' key in a standard 
> datastore 
> >> query, to get all documents in that shard. (using a greater than 
> __key__ 
> >> filter, with the results in __key__ order). Works sot of similar to how 
> a 
> >> cursor actully works under the hood. 
> >> 
> >> 
> >>> 
> >>> 
> >>> Also, I haven't understood what's wrong with using tasks and cursors. 
> >> 
> >> 
> >> The main 'advantage' of using the __scatter__ is, can get all the data 
> to 
> >> setup ALL the tasks at once. Say you want 200 shards. One query 
> retrieving 
> >> 200 keys, and you can immidiately create all 200 tasks*. You can then 
> begin 
> >> processing those tasks in any order, even concurrently. (althouh if you 
> have 
> >> lots of shards to create, may use cursors for 'looping' that initial 
> query!) 
> >> 
> >> Using a 'while loop' and cursors. You have to run each query, and get 
> all 
> >> the data (keys only or the actual documents), to get the cursor to 
> begin the 
> >> next task. Even if you do keys only queries, to get all the cursors 
> (and add 
> >> the cursors to the task, so they can get all the documents for real), 
> you 
> >> downloading much more data than you need. 
> >> 
> >> 
> >> So with the 'nibble away' approach, you either have to be very 
> inefficient 
> >> in creating all the initial tasks (so can get a progress report), or 
> just 
> >> have to create the 'next task' after running the for-real query (ie get 
> the 
> >> next cursor in the query) - in which case cant parallelize. 
> >> 
> >> 
> >> Both approaches have their pro's and cons. One is much simpler and 
> easier 
> >> to understand, the other is more complex, but should be more efficient. 
> >> 
> >> 
> >> 
> >> 
> >> * actully the mapreduce lib, does oversample to improve the results, so 
> >> its not quite that efficient. 
> >> 
> >>> 
> >>> A task can simply iterate the data, fetching N entities at time (in 
> this 
> >>> case 1000), than launch a task that can update them (by itself or 
> splitting 
> >>> again the work). At most, if 10 minutes aren't so much to iterate over 
> all 
> >>> data, this "main" task can fetch only a piece of data, start update 
> task, 
> >>> than start itself on the next chunk of data... 
> >> 
> >> 
> >> Yes, for small batch runs, its not going to make a big difference, but 
> the 
> >> bigger the whole dataset to be iterated, the more any efficiency gains 
> have 
> >> an affect. 
> >> 
> >> 
> >>> 
> >>> 
> >>> Obviously if is possible to split the work using only key's 
> informations 
> >>> (for example, keys are number from 1 to 400.000) this works better, 
> because 
> >>> fetching data by key is always a better than querying for the same 
> data... 
> >> 
> >> 
> >> That's what the __scatter__ does allow :) 
> >> 
> >> 
> >> -- 
> >> You received this message because you are subscribed to the Google 
> Groups 
> >> "Google App Engine" group. 
> >> To unsubscribe from this group and stop receiving emails from it, send 
> an 
> >> email to google-appengi...@googlegroups.com . 
> >> To post to this group, send email to 
> >> google-a...@googlegroups.com. 
>
> >> Visit this group at http://groups.google.com/group/google-appengine. 
> >> For more options, visit https://groups.google.com/groups/opt_out. 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Google App Engine" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to google-appengi...@googl

Re: [google-appengine] Re: Best way to update 400,000 entities at once?

2014-03-05 Thread Lorenzo Bugiani
Well, the documentation says that __scatter__ is intended only for
MapReduce and could change in the future, so I think isn't a good idea
write some production software based on it now.

Maybe when MapReduce Library will be GA, we could use __scared__ in a
secure way...


2014-03-05 21:27 GMT+01:00 Jeff Schnitzer :

> Wow, I had no idea this exists. It's brilliant!
>
> This seems to be the only formal documentation:
>
>
> https://code.google.com/p/appengine-mapreduce/wiki/ScatterPropertyImplementation
>
> Perhaps this should be added to the main GAE documentation? I never
> would have found it.
>
> Jeff
>
> On Wed, Mar 5, 2014 at 7:32 AM, Lorenzo Bugiani 
> wrote:
> > Ok thanks!
> >
> > I've heard of __scatter__ now for the first time! :D
> >
> >
> > 2014-03-05 16:14 GMT+01:00 Barry Hunter :
> >
> >>
> >>
> >>
> >> On 5 March 2014 14:24, Lorenzo Bugiani  wrote:
> >>>
> >>> I haven't understand what you can do with __scatter__ property.
> >>> As MapReduce docs says, "We do not allow retrieving this value directly
> >>> (it's stripped from the entity before it's returned to the application
> by
> >>> the Datastore)"...
> >>
> >>
> >> Suppose you want to break the whole dataset into N number of shards.
> >>
> >> In theory, if you just take the first N keys from a keys only query
> sorted
> >> by __scatter__, you get N keys evenly spread out thoughtout the whole
> >> dataset.
> >>
> >> Each of those keys can then be used as 'first' key in a standard
> datastore
> >> query, to get all documents in that shard. (using a greater than __key__
> >> filter, with the results in __key__ order). Works sot of similar to how
> a
> >> cursor actully works under the hood.
> >>
> >>
> >>>
> >>>
> >>> Also, I haven't understood what's wrong with using tasks and cursors.
> >>
> >>
> >> The main 'advantage' of using the __scatter__ is, can get all the data
> to
> >> setup ALL the tasks at once. Say you want 200 shards. One query
> retrieving
> >> 200 keys, and you can immidiately create all 200 tasks*. You can then
> begin
> >> processing those tasks in any order, even concurrently. (althouh if you
> have
> >> lots of shards to create, may use cursors for 'looping' that initial
> query!)
> >>
> >> Using a 'while loop' and cursors. You have to run each query, and get
> all
> >> the data (keys only or the actual documents), to get the cursor to
> begin the
> >> next task. Even if you do keys only queries, to get all the cursors
> (and add
> >> the cursors to the task, so they can get all the documents for real),
> you
> >> downloading much more data than you need.
> >>
> >>
> >> So with the 'nibble away' approach, you either have to be very
> inefficient
> >> in creating all the initial tasks (so can get a progress report), or
> just
> >> have to create the 'next task' after running the for-real query (ie get
> the
> >> next cursor in the query) - in which case cant parallelize.
> >>
> >>
> >> Both approaches have their pro's and cons. One is much simpler and
> easier
> >> to understand, the other is more complex, but should be more efficient.
> >>
> >>
> >>
> >>
> >> * actully the mapreduce lib, does oversample to improve the results, so
> >> its not quite that efficient.
> >>
> >>>
> >>> A task can simply iterate the data, fetching N entities at time (in
> this
> >>> case 1000), than launch a task that can update them (by itself or
> splitting
> >>> again the work). At most, if 10 minutes aren't so much to iterate over
> all
> >>> data, this "main" task can fetch only a piece of data, start update
> task,
> >>> than start itself on the next chunk of data...
> >>
> >>
> >> Yes, for small batch runs, its not going to make a big difference, but
> the
> >> bigger the whole dataset to be iterated, the more any efficiency gains
> have
> >> an affect.
> >>
> >>
> >>>
> >>>
> >>> Obviously if is possible to split the work using only key's
> informations
> >>> (for example, keys are number from 1 to 400.000) this works better,
> because
> >>> fetching data by key is always a better than querying for the same
> data...
> >>
> >>
> >> That's what the __scatter__ does allow :)
> >>
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "Google App Engine" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to google-appengine+unsubscr...@googlegroups.com.
> >> To post to this group, send email to google-appengine@googlegroups.com.
> >> Visit this group at http://groups.google.com/group/google-appengine.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to google-appengine+unsubscr...@googlegroups.com.
> > To post to this group, send email to google-appengine@googlegroups.com.
> > Visit this group at http://groups.google.c

Re: [google-appengine] Re: Python27 + new dev_appservery.py + debugging

2014-03-05 Thread PK
I just checked out the latest pycharm to see whether debugging is still slow 
but I did not even got to it. I use symbolic links from the file tree I deploy 
to my source code to accommodate generated files, multiple modules etc. The 
Pycharm debugger cannot deal with symbolic links so it is still not an option 
for me.

It is a great IDE otherwise that I use every day… Maybe one day they will 
remove this limitation and if performance has been improved will be great.

PK
http://www.gae123.com

On March 5, 2014 at 7:32:08 AM, Renzo Nuccitelli (ren...@gmail.com) wrote:

+1 for Pycharme. I dont know why it slow down de developmente, for me it is 
great: Debuging, testing and even git integration is great. Most of time i 
don't need to use another tool for development.

On Saturday, March 1, 2014 12:59:09 PM UTC-3, Cezary Wagner wrote:
What is the best tools to use dev_appserver.py with breakpoint and debugging - 
what is the best choice.
--
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


[google-appengine] How does App Engine scale to large applications?

2014-03-05 Thread ThePiachu
I am currently looking into developing a large application either on App 
Engine or virtual machines like Amazon EC2. I am wondering how well does 
the App Engine scale in comparison to Amazon?

The application I have in mind needs to be storing over 20GB of indexed 
data of small real-time records (usually 10-100kB), be able to retrieve 
that information quickly and be able to scale up to a large amount of 
traffic.

Here is an example of a service similar to what we are aiming to develop 
- https://blockchain.info/ .

Since the project is still in early development, there is no preference for 
one platform over the other - we are still hiring developers, so we can 
hire ones that can develop for App Engine or Amazon. I am more concerned 
about the long-term advantages and disadvantages of choosing one option 
over the other.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] How does App Engine scale to large applications?

2014-03-05 Thread Rafael
Are you planning on using websockets like blockchain?

https://cloud.google.com/developers/articles/real-time-gaming-with-node-js-websocket-on-gcp

If yes, you should be better off GCE/EC2 or a combo of GCE and AppEngine if
you really want to use appengine.

good luck.
rafa




On Wed, Mar 5, 2014 at 3:19 PM, ThePiachu  wrote:

> I am currently looking into developing a large application either on App
> Engine or virtual machines like Amazon EC2. I am wondering how well does
> the App Engine scale in comparison to Amazon?
>
> The application I have in mind needs to be storing over 20GB of indexed
> data of small real-time records (usually 10-100kB), be able to retrieve
> that information quickly and be able to scale up to a large amount of
> traffic.
>
> Here is an example of a service similar to what we are aiming to develop -
> https://blockchain.info/ .
>
> Since the project is still in early development, there is no preference
> for one platform over the other - we are still hiring developers, so we can
> hire ones that can develop for App Engine or Amazon. I am more concerned
> about the long-term advantages and disadvantages of choosing one option
> over the other.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [google-appengine] Re: Python27 + new dev_appservery.py + debugging

2014-03-05 Thread Renzo Nuccitelli
2014-03-05 19:48 GMT-03:00 PK :

> I use symbolic links from the file tree I deploy to my source code to
> accommodate generated files, multiple modules etc


If you are trying to debug Python code, I don't see how Pycharm is the
problem. Regarding Python code, what I do is setting up a virtualenv on
some folder inside the project, setting this on Pycharm so it can
autocomplete based on installed libs and I use a symlink to venv's
site-packages so GAE SDK can upload the libs on deployment process.

 You can see all the setup here (it is assumed that virtualenv is
installed):

https://github.com/renzon/tekton/blob/master/project_template/backend/venv/venv.sh

Once the symlink is there, you have to add it to path through Python code,
what you can see on line 4:

https://github.com/renzon/tekton/blob/master/project_template/backend/src/convention.py

With this configuration I have no performance issues. But I know Pycharm
itself is very heavy, so depending on you hardware, it can be very slow.

I hope it helps you.


-- 
  Renzo Nuccitelli
  www.python.pro.br

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.