[Google-Base-API] Re: Detecting batch xml overflow (> 1 Megabyte)

icebackhaz Wed, 17 Dec 2008 07:18:31 -0800

Very nice.  Thanks a ton for your input.  We're in a slightly
different mode vis. your poller: any event changing data will (if
necessary) queue the affected items for (re-)sending but I guess the
result is the same (consistency).  Did you ever intentionally exceed
the 5query/sec rule?  If so, does GB generate an intelligible error
condition?  This harks back to the start of this thread:  the response
doesn't say "TOO BIG" rather than "BAD ITEM".  Actually it would be
saying "BAD ITEM" but doesn't because of the (alleged) bug.
I would like to make ~18,000 maximally sized queries per hour and
would like to properly detect the occasional over sized load.  We'll
see what happens.


Thanks again.

On Dec 17, 7:17 am, Tom Wilson <[email protected]> wrote:
> 1. No before we send, this is normally deleted after being
> successfully submitted as i stated.
>
> 2. Yes deletes are done in batches, since all you need to pass is the
> id i worked out exactly how many can be sent without hitting size
> limits.
>
> 3. Yep, since there a 5 query per second limit (~18,000 per hr) the
> system spread the loads
>
> Your not labouring the point at all, you sound to me to be in the same
> mindset i was when i first started using the API.
> One the best parts about the system is the poller which checks for
> consistency between my database and Google Base.
>
> Matching prices being a important element, but it checks a number of
> things for matches.
>
> For example in the the recent changes to product_types the system
> handled that actually quite well, it has to update all items with the
> new structure.
> Its one of those things that a non-critical change so items can not be
> updated for days, the poller was updated to scan and match product
> types if the item didn't have it its queued for processing as a low
> priority.
>
> Low priority items are done after everything else, then updates to
> expiration dates.
>
> On Dec 17, 12:46 am, icebackhaz <[email protected]> wrote:
>
> > Do you grab the xml after the send?  i.e. from the httprequest or do
> > you write it out yourself.
>
> > Hate to belabour the point:  Are you only using batch() for deletes?
> > The rest are singletons (1000 movements + ~500000/30 resends per day)?
>
> > On Dec 15, 3:19 pm, Tom Wilson <[email protected]> wrote:
>
> > > Timing and load is critical. This set-up is used to keep 500,000 or so
> > > items up to date of which there are roughly 1000 item movements a day
> > > (update/delete/addition) after that the items are simply touched once
> > > in thirty days to update the expiration date.
>
> > > Deletions are handled in bulk because all this is require is the
> > > google base itemID, updates are then handled then it continues on
> > > updating items to extent the expiration dates. So basically a rolling
> > > process, remembering though that static items can only be resubmitted
> > > once in a thirty day period.
>
> > > xml is saved to check size and keep record then deleted once submitted
> > > sucessfully, if theres a problem its stored for problem solving
> > > purposes if theres a high number of failures it takes required action
> > > to alert and halts processing until resolved. That said each action
> > > runs separately (deletes additions etc...)  theres also a clean up
> > > process that runs in background and checks items against the Google
> > > Base database for problem.
>
> > > On Dec 15, 5:01 pm, icebackhaz <[email protected]> wrote:
>
> > > > Let me see if I'm following along.
>
> > > > 1. You save the xml for every transmision?  At what point do you hit
> > > > write-the-file?  (And delete the file?)
>
> > > > 2. You put only one item in a submission?  The communication overhead
> > > > is of no concern to you? How thick is your pipe! :)
>
> > > > On Dec 14, 5:44 pm, Tom Wilson <[email protected]> wrote:
>
> > > > > Writing the xml message to a temp file check it size and then from
> > > > > there decide to send it was the default precaution i took as i do with
> > > > > other projects that fire messages between servers. Then you have a
> > > > > handy reference of failed messages.
>
> > > > > I stick to an item a submital, and its never failed me yet. Even with
> > > > > over 500,000 items if its spread well and the system that drive it is
> > > > > efficient its always been more than enough.
>
> > > > > On Dec 12, 4:33 pm, icebackhaz <[email protected]> wrote:
>
> > > > > > Good question.  Yes it is clear (and provable :) ) that one may only
> > > > > > put 1Mb in the xml payload, but it is not at all clear to me how one
> > > > > > either a) checks how full the payload is and b) detects after the 
> > > > > > fact
> > > > > > that the problem was an overly large payload.
>
> > > > > > We have a lot to send and want to do it efficiently and correctly.
> > > > > > Not checking before hand means we would need a the issues (742, 921)
> > > > > > taken care of or we will not know the exact problem.  If I could 
> > > > > > check
> > > > > > before hand I would never encounter overhead of the double failure
> > > > > > (921).
>
> > > > > > Alternatives abound but don't really appeal.  Continuously calculate
> > > > > > the xml based on known overhead and data lengths: fraught with
> > > > > > miscalculation errors and seriously exposed google api/xml changes.
> > > > > > Send ultra-conservative batch sizes (30 items might be the upper 
> > > > > > limit
> > > > > > of perfectly full (1000 char) values if I've done the arithmetic
> > > > > > correctly): seriously under uses the payload and increases the 
> > > > > > number
> > > > > > of submissions, traffic overhead.  Send less conservatively (perhaps
> > > > > > aggressively) sized batches and on failure assume the payload was 
> > > > > > too
> > > > > > large and split it (recursively): many possible other reasons for
> > > > > > failure.
>
> > > > > > So yes, I'm trying to solidify our end and both these, um, ah 
> > > > > > features
> > > > > > of the API get in the way.  Btw, what we've decided to do on failure
> > > > > > is to generate the xml with on our own Writer and test the size of 
> > > > > > the
> > > > > > generated xml and react accordingly.  Seems a reasonable compromise,
> > > > > > no?
>
> > > > > > Keep in mind, I'm not sure 921 will be accepted as a bug.  I do
> > > > > > believe the java.io.IOException is more the result of mis-
> > > > > > communication (using a closed connection) than anything else.  
> > > > > > Pretty
> > > > > > sure they thought they would be sending back a ServiceException.
> > > > > > InvalidEntryException really, and that's also wrong. 
> > > > > > HTTP_BAD_REQUEST
> > > > > > is the http error, and in this case it the result of payload > 1 Mb,
> > > > > > not that a particular entry is malformed.
>
> > > > > > On Dec 11, 5:37 pm, Tom Wilson <[email protected]> 
> > > > > > wrote:
>
> > > > > > > Can i ask why exactly your looking at this ?
>
> > > > > > > The documentation states that it accepts batch requests up to 1MB 
> > > > > > > so
> > > > > > > why are you checking the size before posting ?
>
> > > > > > > There only so much the API will do for you but building a solid 
> > > > > > > system/
> > > > > > > app relies on checks on both ends.
>
> > > > > > > Tom Wilson
> > > > > > > Freelance Google Base Developer and 
> > > > > > > Consultantwww.tomthedeveloper.com
>
> > > > > > > Google Base Tools -http://dev.tomthedeveloper.com/googlebase
> > > > > > > Featured Project 
> > > > > > > :http://google-code-featured.blogspot.com/2008/02/google-base-competit...
>
> > > > > > > On Dec 11, 11:28 pm, icebackhaz <[email protected]> wrote:
>
> > > > > > > > Trying to see what happens when one stuffs too much into the xml
> > > > > > > > payload, we have discovered that there is no facility for 
> > > > > > > > detecting
> > > > > > > > the exact problem either before or after the send.
>
> > > > > > > > The repsonse from the server is  
> > > > > > > > HttpURLConnection.HTTP_BAD_REQUEST.
> > > > > > > > It looks like the intension was to generate a 
> > > > > > > > InvalidEntryException
> > > > > > > > and that looks bogus too, but at least the getReason() might be 
> > > > > > > > useful
>
> > > > > > > > Unfortunately the built-in second attempt to send the payload 
> > > > > > > > chokes
> > > > > > > > big time and we get a java.io.IOException from deep in Sun's 
> > > > > > > > code and
> > > > > > > > the 401 floats off into the either.
>
> > > > > > > > I've sent the to the bugs forum, but not sure it will be 
> > > > > > > > recognized as
> > > > > > > > a bug:http://code.google.com/p/gdata-issues/issues/detail?id=921
>
> > > > > > > > This is further compounded by the fact that even if we hadn't 
> > > > > > > > been
> > > > > > > > blown out of the water there is no access to the httpRequest 
> > > > > > > > which has
> > > > > > > > the all important header: "Content-Length"  (This has been 
> > > > > > > > recognized
> > > > > > > > as a 
> > > > > > > > bug:http://code.google.com/p/gdata-issues/issues/detail?id=742&sort=-id&c...)
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google Base Data API" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/Google-Base-data-API?hl=en
-~----------~----~----~----~------~----~------~--~---

[Google-Base-API] Re: Detecting batch xml overflow (> 1 Megabyte)

Reply via email to