On February 1, 2010, Matt Lee wrote: > On 02/01/2010 11:29 PM, Seth Woodworth wrote: > > I would suggest, when possible, using the Html5lib parser and > > using the traverser from BeautifulSoup. The author himself > > suggests[1] this in any case of BS-3.1.0 or 3.0.8 behaving > > poorly. > > > > I have been doing work with python, BeautifulSoup and > > Html5Lib lately, and I've been collecting and slowly > > improving python scripts (like this) to liberate data from > > websites like Reddit or the Ubuntu forums. I would love to > > get involved with the lastscrape.py script. > > http://bugs.libre.fm/wiki/LastToLibre is the new way to do > this. > > Last.fm has an API now, for people like us ;)
Well, it took about 6 hours to download, probably half a dozen restarts needed. I decided to overlap the pages (if it failed at page N, I restarted at page N-1). In the 6 hours, the total number of pages went up by 1 (to 6905 pages). So, I guess I am going to have to clean this up a little. (Not today.) Do you require me to upload this to libre.fm in pieces, or can it be just one big file? Gord
