HI, Thanks for the detailed information.
Before we analyze your issue, I would like to let you know that IMAP migration and gdata library based migration ( EMAPI) are two completely different interfaces to gmail. >From the data you have shared, I suspect this is a case of migrating too fast. EMAPI is rate limited and the best strategy is to exponentially back off when migration fails. HTTP Status Code 503 is an indication that the migration failed. To track the status of individual entries in a batch request, you can check the <batch:status> element in the returned Feed for each entry: http://code.google.com/apis/gdata/batch.html#Writing_Feed A '201' indicates a successful insert. Other possible status codes are listed here: http://code.google.com/apis/gdata/batch.html#Handling_Errors -Anirudh On Mar 31, 4:40 am, Theo Van Dinter <[email protected]> wrote: > Hey folks, > > I've been in the process of converting over my mail server to gafyd. > I moved a lot mail via the IMAP migration tool, which worked pretty > well ... but I have several thousand/GB of messages in a "not > conducive for IMAP server" format which I also want to move over. I > thought it'd be pretty straightforward, so I wrote up a python script > to handle this format and also formats supported by the mailbox > module. Messages are batched up and then submitted as you would > expect using gdata. > > The issue is that not all the messages are making it into gmail. I've > spent a bit of time trying to debug this, and I seem to not be able to > figure out what's going on. In short: > > Submitting batch of 2038 messages > Submitting batch of 74 messages > Processed a total of 2112 message(s), 7.22M. > Submitting batch of 1372 messages > Submitting batch of 70 messages > Processed a total of 1442 message(s), 7.68M. > > so that's 3554 messages total. But after letting gmail process it, > ie: waiting a couple of hours until "All Mail" stops changing, I end > up only having 2089 messages, from across the 4 submissions (ie: it's > not just from a single batch upload). Since the source messages are > broken up on-disk by date, I tried uploading several single months in > individual batches, and ended up with: > > (source msg count, month/file, imported msg count) > > 31 2002.07.csv 15 > 70 2002.08.csv 53 > 105 2002.09.csv 66 > 111 2002.10.csv 67 > 101 2002.11.csv 62 > 103 2002.12.csv 63 > > so clearly, I'm usually getting under 2/3 of the messages I upload. > In this per-month mode, if I go ahead and try to batch upload a whole > month again, the missing messages all show up. This tells me that it > is not my script or the source messages themselves which are the > issue. > > Based on the documentation, I was expecting an exception in the case > of an upload error, but none were raised. > I then was going to look at the return value from > gdata.apps.migration.service.SubmitBatch() to see if there's anything > useful there, but noticed two things: > > a) The pydoc says: > Returns: > A HTTPResponse from the web service call. > > which is great, except it's not actually HTTPResponse. it's > gdata.apps.migration.BatchMailEventFeed. :( > > b) as far as I can tell, there's no useful response values in > gdata.apps.migration.BatchMailEventFeed, all methods seem to be about > getting URLs. > > Based on some other messages I've read, I thought maybe there were too > many/frequent submissions, so after each submit I added a sleep(O(log > n)), which helped out somewhat but still didn't completely solve the > problem. I haven't tried making the sleep time larger, or trying a > single large batch to see what happens. > > Any thoughts? Is there a way to get better result information? Is > there a way to look at the gmail import process and see errors (I'm > assuming API migration is the same as IMAP migration, in that there's > a "submit into queue" portion and a "import from queue" portion)? > > Thanks. :) > > PS: I'm counting/checking the imported messages using an IMAP client > against gmail, so the difference isn't a message count vs conversation > count issue. Also, the batches are limited by message count and size > (sum of the return values from AddBatchEntry()), so I shouldn't be > anywhere near the documented 32MB limit. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google Apps APIs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-apps-apis?hl=en -~----------~----~----~----~------~----~------~--~---
