John Nagle schrieb: > I'm reading the PhishTank XML file of active phishing sites, > at "http://data.phishtank.com/data/online-valid/" This changes > frequently, and it's big (about 10MB right now) and on a busy server. > So once in a while I get a bogus copy of the file because the file > was rewritten while being sent by the server. > > Any good way to deal with this, short of reading it twice > and comparing?
Make them fix the obvious bug they have would be the best of course. Apart from that - the only thing you could try is to apply a SAX parser on the input stream immediatly, so that at least if the XML is non-valid because of the way they serve it you get to that ASAP. But it will only shave off a few moments. Diez -- http://mail.python.org/mailman/listinfo/python-list