[google-appengine] Re: Early Christmas Present from Google?

2010-11-07 Thread Marc Provost
+1

Thanks googlers! This is awesome.

On Nov 7, 4:58 pm, nickmilon  wrote:
> + 1
> Impressive performance gains - congratulation to Google and App Engine
> team.
> Lets hope current performance will be a benchmark for the future.
>
> On Nov 7, 12:17 am, Greg  wrote:
>
>
>
>
>
>
>
> > Check out the datastore stats after today's maintenance...
>
> >http://code.google.com/status/appengine/detail/datastore/2010/11/06#a...

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Cron jobs fail with error "Request was aborted after waiting too long to attempt to service your request."

2010-01-19 Thread Marc Provost
This issue driving me crazy! I have cron jobs that must run overnight
and
I started to see this pattern about 3-4 weeks ago. I had to run the
jobs multiple times and
I still have to check them every morning, because sometimes all of
them failed (with the same duration as you guys). Also,
I get this error at a 4-5% rate for client requests.

Google, please let us know what is going on.


On Jan 15, 7:48 pm, Peter Liu  wrote:
> Also the duration is always between 10s and 11s. cpu is always 0ms.
>
> On Jan 15, 4:46 pm, Peter Liu  wrote:
>
> > 01-15 04:36PM 28.187 /p/tempClean 500 10081ms 0cpu_ms 0kb
>
> > Error in last 17 hours: 51      5.4%
>
> > Job run every minute, it query a kind and delete the entries.
> > Currently there's no entry so the task just do a simple query. There's
> > no other traffic either.
>
> > On Dec 14 2009, 4:11 am, Abhi  wrote:
>
> > > Sometimes my cron jobs fail with a HTTP 500 error and message:-
>
> > > Request was aborted after waiting too long to attempt to service your
> > > request. Most likely, this indicates that you have reached your
> > > simultaneous dynamic request limit. This is almost always due to
> > > excessively high latency in your app. Please 
> > > seehttp://code.google.com/appengine/docs/quotas.htmlformoredetails.
>
> > > When this happens the logs show that the job took about 10086ms of CPU
> > > time. The cron is the only job running in the application and it fires
> > > one request every 5 minutes. I don't see any reason of why the quota
> > > for simultaneous request should be exceed by this one req/5 minutes
> > > application. There is nothing else this application is doing.
>
> > > If i access the same page (which has admin only permissions - so i am
> > > sure no one else can access it) from a browser it never fails.
>
> > > Can someone help me with this?
-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Re: Cron jobs fail with error "Request was aborted after waiting too long to attempt to service your request."

2010-01-19 Thread Marc Provost
This issue driving me crazy! I have cron jobs that must run overnight
and
I started to see this pattern about 3-4 weeks ago. I had to run the
jobs multiple times and
I still have to check them every morning, because sometimes all of
them failed (with the same duration as you guys). Also,
I get this error at a 4-5% rate for client requests.

Google, please let us know what is going on.


On Jan 15, 7:48 pm, Peter Liu  wrote:
> Also the duration is always between 10s and 11s. cpu is always 0ms.
>
> On Jan 15, 4:46 pm, Peter Liu  wrote:
>
> > 01-15 04:36PM 28.187 /p/tempClean 500 10081ms 0cpu_ms 0kb
>
> > Error in last 17 hours: 51      5.4%
>
> > Job run every minute, it query a kind and delete the entries.
> > Currently there's no entry so the task just do a simple query. There's
> > no other traffic either.
>
> > On Dec 14 2009, 4:11 am, Abhi  wrote:
>
> > > Sometimes my cron jobs fail with a HTTP 500 error and message:-
>
> > > Request was aborted after waiting too long to attempt to service your
> > > request. Most likely, this indicates that you have reached your
> > > simultaneous dynamic request limit. This is almost always due to
> > > excessively high latency in your app. Please 
> > > seehttp://code.google.com/appengine/docs/quotas.htmlformoredetails.
>
> > > When this happens the logs show that the job took about 10086ms of CPU
> > > time. The cron is the only job running in the application and it fires
> > > one request every 5 minutes. I don't see any reason of why the quota
> > > for simultaneous request should be exceed by this one req/5 minutes
> > > application. There is nothing else this application is doing.
>
> > > If i access the same page (which has admin only permissions - so i am
> > > sure no one else can access it) from a browser it never fails.
>
> > > Can someone help me with this?
-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Re: Cron jobs fail with error "Request was aborted after waiting too long to attempt to service your request."

2010-01-19 Thread Marc Provost
Hi everybody,

Sorry for the double post! I wanted to let you know an issue already
exists for this problem: 
http://code.google.com/p/googleappengine/issues/detail?id=2396.
I'm still unsure if it's because we are using the API incorrectly or
if this is really a bug.

In the last few days, I've been trying to reduce the running time of
my tasks by splitting them even more. I realized that some of them did
timeout occasionally. They would eventually succeed, because failed
tasks are re-executed by the engine, but strangely those URLs did fail
very often with "Request was aborted..." (>40% of all requests failed
like this). I started to wonder if they were somehow marked as being
"bad" by the engine because they are long-running. Other faster URL
would occasionally fail, but the rate was much smaller. If we take a
look at these URL (data taken directly from my dashboard):

Task A, "Request was aborted..." error rate (45%)  (running time >
15s, will sometimes fail due to timeouts)
Task B, "Request was aborted..." error rate (4.8%) (running time <
200ms )
Task C, "Request was aborted..." error rate (18%)  (running time 4-5s)

I'm still experimenting with this, I will let you know if I'm able to
reduce the error rate of Task A. I'm currently splitting all my tasks
so that each of them writes to only one entity (before I would split
them in groups of 30-40).

Jason (Google), what are your thoughts on this?

Thanks!


On Jan 18, 10:05 am, Marc Provost  wrote:
> This issue driving me crazy! I have cron jobs that must run overnight
> and
> I started to see this pattern about 3-4 weeks ago. I had to run the
> jobs multiple times and
> I still have to check them every morning, because sometimes all of
> them failed (with the same duration as you guys). Also,
> I get this error at a 4-5% rate for client requests.
>
> Google, please let us know what is going on.
>
> On Jan 15, 7:48 pm, Peter Liu  wrote:
>
> > Also the duration is always between 10s and 11s. cpu is always 0ms.
>
> > On Jan 15, 4:46 pm, Peter Liu  wrote:
>
> > > 01-15 04:36PM 28.187 /p/tempClean 500 10081ms 0cpu_ms 0kb
>
> > > Error in last 17 hours: 51      5.4%
>
> > > Job run every minute, it query a kind and delete the entries.
> > > Currently there's no entry so the task just do a simple query. There's
> > > no other traffic either.
>
> > > On Dec 14 2009, 4:11 am, Abhi  wrote:
>
> > > > Sometimes my cron jobs fail with a HTTP 500 error and message:-
>
> > > > Request was aborted after waiting too long to attempt to service your
> > > > request. Most likely, this indicates that you have reached your
> > > > simultaneous dynamic request limit. This is almost always due to
> > > > excessively high latency in your app. Please 
> > > > seehttp://code.google.com/appengine/docs/quotas.htmlformoredetails.
>
> > > > When this happens the logs show that the job took about 10086ms of CPU
> > > > time. The cron is the only job running in the application and it fires
> > > > one request every 5 minutes. I don't see any reason of why the quota
> > > > for simultaneous request should be exceed by this one req/5 minutes
> > > > application. There is nothing else this application is doing.
>
> > > > If i access the same page (which has admin only permissions - so i am
> > > > sure no one else can access it) from a browser it never fails.
>
> > > > Can someone help me with this?
-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Re: Cron jobs fail with error "Request was aborted after waiting too long to attempt to service your request."

2010-01-19 Thread Marc Provost
Hi everybody,

Sorry for the double post! I wanted to let you know an issue already
exists for this problem: 
http://code.google.com/p/googleappengine/issues/detail?id=2396.
I'm still unsure if it's because we are using the API incorrectly or
if this is really a bug.

In the last few days, I've been trying to reduce the running time of
my tasks by splitting them even more. I realized that some of them did
timeout occasionally. They would eventually succeed, because failed
tasks are re-executed by the engine, but strangely those URLs did fail
very often with "Request was aborted..." (>40% of all requests failed
like this). I started to wonder if they were somehow marked as being
"bad" by the engine because they are long-running. Other faster URL
would occasionally fail, but the rate was much smaller. If we take a
look at these URL (data taken directly from my dashboard):

Task A, "Request was aborted..." error rate (45%)  (running time >
15s, will sometimes fail due to timeouts)
Task B, "Request was aborted..." error rate (4.8%) (running time <
200ms )
Task C, "Request was aborted..." error rate (18%)  (running time 4-5s)

I'm still experimenting with this, I will let you know if I'm able to
reduce the error rate of Task A. I'm currently splitting all my tasks
so that each of them writes to only one entity (before I would split
them in groups of 30-40).

Jason (Google), what are your thoughts on this?

Thanks!


On Jan 18, 10:05 am, Marc Provost  wrote:
> This issue driving me crazy! I have cron jobs that must run overnight
> and
> I started to see this pattern about 3-4 weeks ago. I had to run the
> jobs multiple times and
> I still have to check them every morning, because sometimes all of
> them failed (with the same duration as you guys). Also,
> I get this error at a 4-5% rate for client requests.
>
> Google, please let us know what is going on.
>
> On Jan 15, 7:48 pm, Peter Liu  wrote:
>
> > Also the duration is always between 10s and 11s. cpu is always 0ms.
>
> > On Jan 15, 4:46 pm, Peter Liu  wrote:
>
> > > 01-15 04:36PM 28.187 /p/tempClean 500 10081ms 0cpu_ms 0kb
>
> > > Error in last 17 hours: 51      5.4%
>
> > > Job run every minute, it query a kind and delete the entries.
> > > Currently there's no entry so the task just do a simple query. There's
> > > no other traffic either.
>
> > > On Dec 14 2009, 4:11 am, Abhi  wrote:
>
> > > > Sometimes my cron jobs fail with a HTTP 500 error and message:-
>
> > > > Request was aborted after waiting too long to attempt to service your
> > > > request. Most likely, this indicates that you have reached your
> > > > simultaneous dynamic request limit. This is almost always due to
> > > > excessively high latency in your app. Please 
> > > > seehttp://code.google.com/appengine/docs/quotas.htmlformoredetails.
>
> > > > When this happens the logs show that the job took about 10086ms of CPU
> > > > time. The cron is the only job running in the application and it fires
> > > > one request every 5 minutes. I don't see any reason of why the quota
> > > > for simultaneous request should be exceed by this one req/5 minutes
> > > > application. There is nothing else this application is doing.
>
> > > > If i access the same page (which has admin only permissions - so i am
> > > > sure no one else can access it) from a browser it never fails.
>
> > > > Can someone help me with this?
-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Re: Cron jobs fail with error "Request was aborted after waiting too long to attempt to service your request."

2010-01-19 Thread Marc Provost
Hi everybody,

Sorry for the double post! I wanted to let you know an issue already
exists for this problem: 
http://code.google.com/p/googleappengine/issues/detail?id=2396.
I'm still unsure if it's because we are using the API incorrectly or
if this is really a bug.

In the last few days, I've been trying to reduce the running time of
my tasks by splitting them even more. I realized that some of them did
timeout occasionally. They would eventually succeed, because failed
tasks are re-executed by the engine, but strangely those URLs did fail
very often with "Request was aborted..." (>40% of all requests failed
like this). I started to wonder if they were somehow marked as being
"bad" by the engine because they are long-running. Other faster URL
would occasionally fail, but the rate was much smaller. If we take a
look at these URL (data taken directly from my dashboard):

Task A, "Request was aborted..." error rate (45%)  (running time >
15s, will sometimes fail due to timeouts)
Task B, "Request was aborted..." error rate (4.8%) (running time <
200ms )
Task C, "Request was aborted..." error rate (18%)  (running time 4-5s)

I'm still experimenting with this, I will let you know if I'm able to
reduce the error rate of Task A. I'm currently splitting all my tasks
so that each of them writes to only one entity (before I would split
them in groups of 30-40).

Jason (Google), what are your thoughts on this?

Thanks!


On Jan 18, 10:05 am, Marc Provost  wrote:
> This issue driving me crazy! I have cron jobs that must run overnight
> and
> I started to see this pattern about 3-4 weeks ago. I had to run the
> jobs multiple times and
> I still have to check them every morning, because sometimes all of
> them failed (with the same duration as you guys). Also,
> I get this error at a 4-5% rate for client requests.
>
> Google, please let us know what is going on.
>
> On Jan 15, 7:48 pm, Peter Liu  wrote:
>
> > Also the duration is always between 10s and 11s. cpu is always 0ms.
>
> > On Jan 15, 4:46 pm, Peter Liu  wrote:
>
> > > 01-15 04:36PM 28.187 /p/tempClean 500 10081ms 0cpu_ms 0kb
>
> > > Error in last 17 hours: 51      5.4%
>
> > > Job run every minute, it query a kind and delete the entries.
> > > Currently there's no entry so the task just do a simple query. There's
> > > no other traffic either.
>
> > > On Dec 14 2009, 4:11 am, Abhi  wrote:
>
> > > > Sometimes my cron jobs fail with a HTTP 500 error and message:-
>
> > > > Request was aborted after waiting too long to attempt to service your
> > > > request. Most likely, this indicates that you have reached your
> > > > simultaneous dynamic request limit. This is almost always due to
> > > > excessively high latency in your app. Please 
> > > > seehttp://code.google.com/appengine/docs/quotas.htmlformoredetails.
>
> > > > When this happens the logs show that the job took about 10086ms of CPU
> > > > time. The cron is the only job running in the application and it fires
> > > > one request every 5 minutes. I don't see any reason of why the quota
> > > > for simultaneous request should be exceed by this one req/5 minutes
> > > > application. There is nothing else this application is doing.
>
> > > > If i access the same page (which has admin only permissions - so i am
> > > > sure no one else can access it) from a browser it never fails.
>
> > > > Can someone help me with this?
-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Uncatchable severe error "Operation commit failed on resource" logged

2010-02-09 Thread Marc Provost
Hi everybody!

I am using the java implementation and seeing the following error
logged sporadically, both in the development server and live. Note
that it does not seem to be a "real" error, as the commit always goes
through and my data looks perfectly fine. Could it be a low-level
exception that is not converted back to a JDOException? Is it a real
error?

org.datanucleus.transaction.Transaction commit: Operation commit
failed on resource:
org.datanucleus.store.appengine.datastorexaresou...@608d41, error code
UNKNOWN and transaction: [DataNucleus Transaction, ID=Xid=

I think the root cause of this problem is that I'm reading entities
from one entity group, cache them and then I open a transaction on
another entity group. In short, I spawn tasks that first read 30 or so
entities and copy a subset of their content into memory. Then, I open
a transaction and cache this content to a "global" entity (just a
wrapper around a Blob) for later use.  My goal is to go over all the
entities of a kind (1000s) and cache a subset of their data that I
often need. The cache and the other entities are not in the same
entity group. If I only perform the transaction, without reading the
other entities first the error does not occur.

Note that I'm doing this very carefully -- I perform all my data store
operations inside of a try/catch && for loop to retry if necessary.

Thanks for any help!
Marc

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: JDO: Owned 1 to Many relationship between the objects of the same class

2010-02-12 Thread Marc Provost
Hi Alex,

This is a known issue. Check out this thread for discussion +
workarounds: 
http://groups.google.com/group/google-appengine-java/browse_thread/thread/3affdf1441f864b6

Marc

On Feb 12, 3:55 am, Alexander Arendar 
wrote:
> Hi guys,
>
> yesterday I was trying to model a simple forum comments. Just a
> comments which you can add more comments to.
> So, the approximation of the entity is like this:
> ---
> @PersistenceCapable (detachable = "true", identityType =
> IdentityType.APPLICATION)
> public class CommentEntity {
>
>         @PrimaryKey
>         @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
>         private Key key;
>
>         @Persistent
>         private String category;
>
>         @Persistent
>         private String commentDate;
>
>         @Persistent
>         private String userName;
>
>         @Persistent
>         private String commentBody;
>
>         @Persistent
>         private List children = new
> ArrayList();
>
>         getters/setters/etc.
>
> }
>
> DataNucleus enhancement goes ok, no errors in the console.
> Call of the pm.makePersistent() goes without any exceptions for such
> entity.
> BUT IT IS NOT PERSISTED.
>
> I found out that problem is in "children" property. And the problem is
> it's a list of objects of the same class as the parent entity. If I
> comment that property declaration the entity is persisted. Also if I
> change the type of the child entities to some new class (not extending
> the CommentEntity) it also gets persisted.
>
> So my suspicion is that JDO (or GAE JDO impl) does not allow child
> entities to be of the same class. Is it correct? Maybe I'm missing
> something essential? Please advice.
>
> Sincerely,
> Alex

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Task Queue oddness

2010-02-22 Thread Marc Provost
This is a wild guess, but maybe they need to prevent people from
"externally" scheduling cron jobs at a faster rate than what is
allowed. Currently, cron jobs can be executed at most every minute,
allowing a refresh to bypass that limit would allow an external server
to schedule cron jobs at a faster rate.

On Feb 21, 6:54 pm, "Ben W."  wrote:
> I have been testing the taskqueue with mixed success. Currently I am
> using the default queue, in default settings ect ect
>
> I have a test url setup which inserts about 8 tasks into the queue.
> With short order, all 8 are completed properly. So far so good.
>
> The problem comes up when I re-load that url twice under say a minute.
> Now watching the task queue, all the tasks are added properly, but
> only the first batch execute it seems.  But the "Run in Last Minute" #
> shows the right number of tasks being run
>
> The request logs tell a different story. They show only the first set
> of 8 running, but all task creation urls working successfully.
>
> The oddness of this is that if I wait say a minute between the task
> creation url requests, it will work fine.
>
> Oddly enough changing the bucket_size or execution speed does not seem
> to help. Only the first batch are executed. I have also reduced the
> number of requests all the way down to 2, and still found only the
> first 2 execute. Any others added display the same issues as above.
>
> Any suggestions?
>
> Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: how to improve performance

2010-02-22 Thread Marc Provost
Hi AJ,

Here's a few tips from my experience:

* If your application currently does not have much traffic, most of
your requests will be "loading requests". This could explain the
randomness your are seeing. The app engine needs to load and prepare
your application to handle a request. In my application (java), a
loading request takes around 2-6 seconds. Once your application is
loaded, it will stay in memory for some time and subsequent requests
will be "normal requests". See 
http://googleappengine.blogspot.com/2009/12/request-performance-in-java.html
for more info. Also, make sure you are using the most recent version
of the app engine so that pre-compilation is enabled by default. I
have not tried the python implementation of the app engine yet, but I
heard that it process loading requests faster.

* Once my application is loaded, fetching the 20 first results of a
simple query takes at most 1-2 seconds, most of the time less than a
second. I use memcache to cache the derived product (usually an html
page) of common requests. Once a request is cached, it will take
100-200ms to serve.

* When you list the topics, make sure you are using cursors:
http://code.google.com/appengine/docs/java/datastore/queriesandindexes.html#Query_Cursors

* When you fetch feeds, you will need to split the work in very small
chunks because of the 30 seconds limit. When you say "fails randomly",
are you getting
"com.google.apphosting.runtime.HardDeadlineExceededError"? From my
experience, appengine behave much better when each of your tasks write
to very few entities. In my application, I'm being very strict: each
task writes to at most one entity. For example, I also need to parse
an external feed and then update 1000 entities with the data from that
feed. In order to achieve that efficiently, I spawn a thousand tasks,
each one updating only one entity.

Hope it helps,
Marc


On Feb 21, 7:02 pm, AJ Chen  wrote:
> I like the potential of appengine as cloud computing platform. Using
> eclipse, I can code in eclipse and then one-click to deploy the changes to
> production. It's awesome!  It's also an exciting experience to learn the new
> programing patterns GAE requires, such as task queue, object data store,
> mencache. All these new stuff are fine as long as it delivers the
> performance at the end. After running my apphttp://realmon9.appspot.comon
> production for a while, I found the response time very often is too long,
> 5-20sec, in the unusable range. Of course, the performce depends on how
> complicated it is to generate the response per request. I'm going to give a
> very brief description of a typical request, and appreciate your suggestion
> for improving the performance..
>
> My app "realmon9"  is a social media monitoring
> application designed as a component in the google cloud so that it can be
> connected to enterprise CRM like salesforce. It basically allows
> organization to monitor a large number of topics on social media and brings
> the relevant conversations/leads to CRM for marketing/PR/support/research
> purpose. The topics, twiter conversations and blogs are stored in datastore
> and the operation is quite simple and straightforward. For example, listing
> a list of topics or listing conversations (20 per page) for a topic. I
> expect this type of viewing request takes <1 sec to respond. It requires
> querying 2-4 kinds of data per request and there are only small number of
> data in this initial stage. But, very often it takes 10 second to respond to
> a simple request.  I'm using java and JDO to query datastore. I have not
> done anything to customize the index configuration yet. Where should I look
> for performance optimization?
>
> I also use task queue to fetch feeds in the background. Because the reponse
> is slow, a large percentage of simple feed fatch tasks fail randomly.
>
> One observation: viewing the same page (e.g. listing topics) sometimes takes
> no time, but sometimes takes 10 second. it's all random, which is probably
> due to the distributed nature of GAE. It may be hard to figure out what to
> improve on the app side when GAE varies wildly in terms response time.
> Anybody know the expected response time or behavior from GAE?
>
> This google app is porting from my server application 
> onhttp://web2express.org. I can make the responce on regular tomcat server
> fast, but GAE is the uncharted torritory. I"m still learning and looking for
> best practice ideas.
>
> thanks,
> -aj
> --
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.orghttp://web2express.org
> @web2express on twitter
> Palo Alto, CA, USA
> 650-283-4091
> *Monitoring social media in real time*

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this

[google-appengine] My max requests / seconds is 3. My cron jobs fail with: "Request was aborted after waiting too long.."

2010-02-26 Thread Marc Provost
Ok, here's my situation:

* I use the java implementation and my app id is poolfana.
* I have a bunch of cron jobs scheduled at night (Eastern Time)
* They are all very much parallelized. I am being very strict: they
spawn tasks that only write to one entity each. Each tasks will
execute in a few hundred ms.
* A given cron job and its spawned tasks will terminate in a few
minutes at most.
* I have scheduled each cron job at least 10 minutes apart, so they do
not overlap.
* In my dashboard, my max request per second is 3. The max limit is
supposed to be 30.
* My problem? The cron jobs fail sporadically (marked as "failed" in
the dashboard) with this error:

"Request was aborted after waiting too long to attempt to service your
request. Most likely, this indicates that you have reached your
simultaneous dynamic request limit. This is almost always due to
excessively high latency in your app. Please see
http://code.google.com/appengine/docs/quotas.html for more details."

There is an issue for this problem: 
http://code.google.com/p/googleappengine/issues/detail?id=2396

It was starred 50+ times, but it was not acknowledged yet by the
google team. I'm writing this post to discuss potential workarounds,
potential misuses of the API with the google team or other people that
might have solved this problem. What else can I do? Is it a problem on
the google side or I'm I doing something wrong? Right now, I need to
re-execute the cron jobs manually everyday...

Thank you!
Marc

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: My max requests / seconds is 3. My cron jobs fail with: "Request was aborted after waiting too long.."

2010-02-26 Thread Marc Provost
Thanks for your quick reply!

I parse several external sources (around 8) and from each source I
need to update the same 1000 entities (most of them already existing,
creation of new entities is rare). For each data source, I schedule a
cron job which spawns 1000 tasks (with attached data) and each of them
will update a single entity. I found by trial and error that app
engine was behaving better the shorter the tasks. So, when I say very
much parallelized, I mean spawning as many tasks as I can for each
cron job, each of them as small as possible. Since I have more
independent tasks running in parallel, my cron jobs execute faster. In
addition, I schedule my cron jobs apart so that they don't overlap,
but this should not matter, as I use the same queue which is limited
at 5 tasks per second.

So, in summary, I have 8 cron jobs and each cron job spawns 1000
tasks. A given cron job and its children tasks terminates in 3-4
minutes at most. The cron jobs are separated so that 2 crons jobs
never execute together.

Marc





On Feb 26, 3:47 pm, Eli Jones  wrote:
> How many is "a bunch"?  Also, You say "they are all very much parallelized"
> but then you say that you've scheduled them 10 minutes apart and they don't
> overlap.. those two statements are contradictory, please explain more
> clearly your cron-taskqueu setup and how it works and what exactly it is
> doing.
>
> When you say that the cron jobs "spawn tasks that write to one entity
> each".. what do you mean?  The cron job is there to fire of the initial
> task.. and that task runs once, putting one entity and that's it?
>
> If so, why are you having these tasks only put one entity at a time..
> instead of creating multiple entities and putting them in batches?  Does
> each task put() new entities? or are they sometimes putting an entity that
> may already exist?
>
> More info is more better for help.
>
> On Fri, Feb 26, 2010 at 3:35 PM, Marc Provost  wrote:
> > Ok, here's my situation:
>
> > * I use the java implementation and my app id is poolfana.
> > * I have a bunch of cron jobs scheduled at night (Eastern Time)
> > * They are all very much parallelized. I am being very strict: they
> > spawn tasks that only write to one entity each. Each tasks will
> > execute in a few hundred ms.
> > * A given cron job and its spawned tasks will terminate in a few
> > minutes at most.
> > * I have scheduled each cron job at least 10 minutes apart, so they do
> > not overlap.
> > * In my dashboard, my max request per second is 3. The max limit is
> > supposed to be 30.
> > * My problem? The cron jobs fail sporadically (marked as "failed" in
> > the dashboard) with this error:
>
> > "Request was aborted after waiting too long to attempt to service your
> > request. Most likely, this indicates that you have reached your
> > simultaneous dynamic request limit. This is almost always due to
> > excessively high latency in your app. Please see
> >http://code.google.com/appengine/docs/quotas.htmlfor more details."
>
> > There is an issue for this problem:
> >http://code.google.com/p/googleappengine/issues/detail?id=2396
>
> > It was starred 50+ times, but it was not acknowledged yet by the
> > google team. I'm writing this post to discuss potential workarounds,
> > potential misuses of the API with the google team or other people that
> > might have solved this problem. What else can I do? Is it a problem on
> > the google side or I'm I doing something wrong? Right now, I need to
> > re-execute the cron jobs manually everyday...
>
> > Thank you!
> > Marc
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-appeng...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: My max requests / seconds is 3. My cron jobs fail with: "Request was aborted after waiting too long.."

2010-02-27 Thread Marc Provost
Thanks for you help Eli! I didn't know about the unique name feature.
I will use that trick, at least for the most important crons. Still, I
wonder why I see this failure if my requests / seconds is never higher
than 3 in my dashboard. I mean, if I saw spikes close to 30 at least I
could start debugging. And the failure rate is much higher than 0.1%.
It's more like 5%. And it's weird, sometimes the first cron will fail
when there is no activity at all... why? It always fail after 10
seconds. Sometimes the first cron fails, sometimes the 5th, etc. And
my app itself has almost no traffic.

A few notes:

* Yeah, I use chaining. I am not adding the 1000s tasks in one shot.
Sorry for not specifying that. None of my cron/queue is close to the
30 second limit. I didn't see a 30 second timeout error in my logs for
a long time.
* I use a queue with a rate of 5 tasks per second, so approximately
300 tasks will go through per minute and 1200 tasks in 4 minutes.
Hence my approximation. I also manually ran the cron jobs and saw them
complete in about 4 minutes.

Basically, the question I'm trying to answer here is:

Why I'm I seeing the "simultaneous dynamic request limit error" if:

1) In my dashboard, the max rate of requests I see is 3. Far from 30.
2) The rate of my queue is 5 / second. Again, far from 30.
3) This error occurs even for the first cron job, when there is no
other tasks / cron running (and my app has almost no traffic)

Thanks again for your tips.
Marc




On Feb 26, 8:16 pm, Wooble  wrote:
> On Feb 26, 7:46 pm, Locke  wrote:
>
> > I have also seen this timeout error when trying to add to the task
> > queue. What is interesting to me is that it kills my process after 10
> > seconds, instead of the thirty seconds we supposedly are allowed.
>
> The 30 seconds are for a request that actually runs.  This message
> indicates your request handler didn't get run at all, because too many
> instances of your application were already running.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: The server encountered an error and could not complete your request.

2010-03-02 Thread Marc Provost
Hi Waleed,

You are not seeing any errors/warnings in the logs? There are a few
possibilities here.

If the request hits your app:

* You could be hitting the 30 seconds limit (maybe because the input
is too large or something). You should see an error in the logs.
* Your code could also be raising an exception that you do not catch.
You should see an error in the logs.
* It is also possible that the return status of the request is set to
500 somehow. In this case, if your application could silently catch
the exception without logging anything. But you should be able to
debug it by logging info messages in your request handler.

If your request does not hit your app:

* You could be hitting the simultaneous requests limit. That's a
warning in the logs, but it shouldn't be reproducible every time.

Hope it helps,
Marc

On Mar 2, 3:09 am, Waleed  wrote:
> I'm getting this error for some requests:
>
> The server encountered an error and could not complete your
> request.If the problem persists, please report your problem and
> mention this error message and the query that caused it.
>
> It's repeatable and the error happens every time for that specific
> request I'm sending. I tested with another AE app and I get the same
> problem. The request doesn't seem to be hitting my app, but fails
> before that, so I can't do anything about it. When I submit the same
> request with different data in the POST body, it goes through and
> works well.
>
> My request is a simple POST with a blog feed in the body of the post.
> Nothing particularly unique about it. And as I mentioned earlier, it
> works for most feeds, except for a few where it breaks 100% of the
> time.
>
> How can I debug this? Can anyone shed some light? App id is:
> networkedblogs

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: simultaneous dynamic request limit

2010-03-02 Thread Marc Provost
Hi David,

I don't have a precise answer, but I think you are going in the right
direction. The idea is to minimize the response time of your most
popular requests using memcache. Try to cache the html pages derived
from the datastore queries. It is easy to drop the request time to a
few hundred ms or even less that way (depending how much you can cache
and if you use java or python). This approach works well if your users
are interested in the same entities. For example a news site can
easily cache its more popular articles that way. For entities/queries
that are specific to each user, but might be reused in many pages,
cache it as soon as possible!

Hope it helps,
Marc


On Mar 2, 3:21 am, Waleed Abdulla  wrote:
> I got the same errors today on my dev app, which I'm the only user of. So it
> doesn't seem to be related to how much load the app has!! I've been noticing
> them on my production app as well on and off.
>
> Waleed
>
> On Mon, Mar 1, 2010 at 7:22 PM, Satoshi  wrote:
> > I've got the same warnings several time today too. The peak access
> > rate was only 3.00 requests/sec, and the CPU time usage over the last
> > 24 hours is 6% (1.08 CPU hours) out of 18.50 CPU hours (I am a paying
> > customer).
>
> > Satoshi
>
> > On Mar 1, 6:51 pm, David  wrote:
> > > I am losing sleep over this, so any help would be greatly appreciated!
>
> > > APP ID: conit-app01
>
> > > Since our app released about a week ago, it has been getting an
> > > average of about 60 requests/second.  On February 27, our app suddenly
> > > crashed and was down for several hours, with thousands of these errors
> > > appearing in the logs:
>
> > > Request was aborted after waiting too long to attempt to service your
> > > request. Most likely, this indicates that you have reached your
> > > simultaneous dynamic request limit. This is almost always due to
> > > excessively high latency in your app. Please seehttp://
> > code.google.com/appengine/docs/quotas.htmlfor more details.
>
> > > Since getting this error, I filled out a request to increase this
> > > limit at:
> >http://code.google.com/support/bin/request.py?contact_type=AppEngineC...
>
> > > This request was denied, because, "your app has been using, over the
> > > past 24 hours, on average 60 QPS with a peak of ~135 QPS; thus you're
> > > well under the 500 QPS limit described above."
>
> > > Since this crash, I've also been working to decrease calls to the
> > > datastore, and I think our average CPU time has decreased around 30%.
> > > In the dashboard, one of our pages still appears "yellow" under the
> > > column "Average CPU (API)", with a speed of about 1100.  This page is
> > > about 6% of the volume of our app.  The other pages don't have any
> > > warnings.  We are well within the limits of our billing.
>
> > > I would feel much better if I could understand the math/metrics that
> > > go into producing this error, so it doesn't happen again.  How can I
> > > know if my page request times are low enough?  If I add a new page
> > > with a higher CPU time, how can I know if it would make the app crash?
>
> > > Any help or references to details on this error would be appreciated.
>
> > > Thank you in advance.
> > > -David
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-appeng...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Post-mortem for February 24th, 2010 outage

2010-03-05 Thread Marc Provost
Wow, I second lennysan. Awesome postmortem! Thank you so much for
sharing it with us.

Marc

On Mar 5, 12:25 pm, lennysan  wrote:
> I've been working on a Guideline for Postmortem Communication, and ran
> this post through the 
> guideline:http://www.transparentuptime.com/2010/03/google-app-engine-downtime-p...
>
> Overall, this may be the simple most impressive postmortem I've seen
> yet. The amount of time and though put into this post is staggering,
> and the takeaways are useful to every organization. I'm especially
> impressed with the proposed new functionality that turns this event
> into a long term positive, which is really all you can ask for after
> an incident.
>
> On Mar 4, 3:22 pm, App Engine Team 
> wrote:
>
> > Post-Mortem Summary
>
> > This document details the cause and events occurring immediately after
> > App Engine's outage on February 24th, 2010, as well as the steps we
> > are taking to mitigate the impact of future outages like this one in
> > the future.
>
> > On February 24th, 2010, all Googe App Engine applications were in
> > varying degraded states of operation for a period of two hours and
> > twenty minutes from 7:48 AM to 10:09 AM PT | 15:48 to 18:09 GMT.  The
> > underlying cause of the outage was a power failure in our primary
> > datacenter. While the Google App Engine infrastructure is designed to
> > quickly recover from these sort of failures, this type of rare
> > problem, combined with internal procedural issues  extended the time
> > required to restore the service.
>
> > <>
>
> > What did we do wrong?
>
> > Though the team had planned for this sort of failure, our response had
> > a few important issues:
>
> > - Although we had procedures ready for this sort of outage, the oncall
> > staff was unfamiliar with them and had not trained sufficiently with
> > the specific recovery procedure for this type of failure.
>
> > - Recent work to migrate the datastore for better multihoming changed
> > and improved the procedure for handling these failures significantly.
> > However, some documentation detailing the procedure to support the
> > datastore during failover incorrectly referred to the old
> > configuration. This led to confusion during the event.
>
> > - The production team had not agreed on a policy that clearly
> > indicates when, and in what situations, our oncall staff should take
> > aggressive user-facing actions, such as an unscheduled failover.  This
> > led to a bad call of returning to a partially working datacenter.
>
> > - We failed to plan for the case of a power outage that might affect
> > some, but not all, of our machines in a datacenter (in this case,
> > about 25%). In particular, this led to incorrect analysis of the
> > serving state of the failed datacenter and when it might recover.
>
> > - Though we were able to eventually migrate traffic to the backup
> > datacenter, a small number of Datastore entity groups, belonging to
> > approximately 25 applications in total,  became stuck in an
> > inconsistent state as a result of the failover procedure. This
> > represented considerably less than 0.2% of data stored in the
> > Datastore.
>
> > Ultimately, although significant work had been done over the past year
> > to improve our handling of these types of outages, issues with
> > procedures reduced their impact.
>
> > What are we doing to fix it?
>
> > As a result, we have instituted the following procedures going
> > forward:
>
> > - Introduce regular drills by all oncall staff of all of our
> > production procedures. This will include the rare and complicated
> > procedures, and all members of the team will be required to complete
> > the drills before joining the oncall rotation.
>
> > - Implement a regular bi-monthly audit of our operations docs to
> > ensure that all needed procedures are properly findable, and all out-
> > of-date docs are properly marked "Deprecated."
>
> > - Establish a clear policy framework to assist oncall staff to quickly
> > and decisively make decisions about taking intrusive, user-facing
> > actions during failures. This will allow them to act confidently and
> > without delay in emergency situations.
>
> > We believe that with these new procedures in place, last week's outage
> > would have been reduced in impact from about 2 hours of total
> > unavailability to about 10 to 20 minutes of partial unavailability.
>
> > In response to this outage, we have also decided to make a major
> > infrastructural change in App Engine. Currently, App Engine provides a
> > one-size-fits-all Datastore, that provides low write latency combined
> > with strong consistency, in exchange for lower availability in
> > situations of unexpected failure in one of our serving datacenters. In
> > response to this outage, and feedback from our users, we have begun
> > work on providing two different Datastore configurations:
>
> > - The current option of low-latency, strong consistency, and lower
> > availability during unexpected fai

[google-appengine] App engine is blocking our server (... Our systems have detected unusual traffic from your computer network ...)

2011-02-22 Thread Marc Provost
This isssue has been covered on other posts -- with no clear answer.
(see: 
http://groups.google.com/group/google-appengine/browse_thread/thread/88cde75072a08fc2/56b80e93e40ce3dd
and 
http://groups.google.com/group/google-appengine/browse_thread/thread/1c8dd575aa6804c2/e84294dc934a1500
for reference)

We are releasing a web version of our app today and we expect a lot of
traffic! We use app engine as a service and our main server calls it a
lot. It seems google thinks our server is a bot:

*

Our systems have detected unusual traffic from your computer network.
Please try your request again later. Why did this happen?

This page appears when Google automatically detects requests coming
from your computer network which appear to be in violation of the
Terms of Service. The block will expire shortly after those requests
stop.

This traffic may have been sent by malicious software, a browser plug-
in, or a script that sends automated requests. If you share your
network connection, ask your administrator for help — a different
computer using the same IP address may be responsible. Learn more

Sometimes you may see this page if you are using advanced terms that
robots are known to use, or sending requests very quickly.

IP address: 74.59.65.112
Time: 2011-02-22T14:57:38Z

**

How can we avoid this problem? Is it possible to whitelist an ip? We
currently do not use authenticated urls on app engine -- could that
solve the issue?

Thank you!
Marc







-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.