brooklineTom wrote: > On Dec 31 2008, 9:56 am, John Machin <sjmac...@lexicon.net> wrote: >> On Dec 31 2008, 4:02 pm, brooklineTom <brookline...@gmail.com> wrote: >> >>> andyh...@gmail.com wrote: >>>> Hi, >>>> Can anybody recommend an approach for loading and parsing Excel >>>> spreadsheets in Python. Any well known/recommended libraries for this? >>>> The only thing I found in a brief search >>>> washttp://www.lexicon.net/sjmachin/xlrd.htm, >>>> but I'd rather get some more input before going with something I don't >>>> know. >>>> Thanks, >>>> Andy. >>> I save the spreadsheets (in Excel) in xml format. >> Which means that you need to be on a Windows box with a licensed copy >> of Excel. I presume you talking about using Excel 2003 and saving as >> "XML Spreadsheet (*.xml)". Do you save the files manually, or using a >> COM script? What is the largest xls file that you've saved as xml, how >> big was the xml file, and how long did it take to parse the xml file? >> Do you extract formatting information or just cell contents? > > 1. The OP requested Excel files, by construction those must be > generated with a licensed copy of Excel. I did the actual processing > on both linux and windoze platforms.
Well, even if Andy meant "Excel files" rather than "Excel-formatted files" there are many ways to come by these without having a licensed copy of Excel. FTP and email attachment come to mind most readily. How then to convert those to XML without Excel? [...]> I looked, briefly, at xlrd. I found and scanned a few alternatives, > though I don't remember what the others were. I needed something I > could incorporate into my own application framework, and I knew I > didn't need most of the formatting information. I'm not in any way > criticizing xlrd, it's simply that, based on its API summary, it seems > focused on problems I didn't have to solve. I knew that I needed only > a small subset of the xlrd behavior, and I concluded (perhaps > incorrectly) that it would be easier to roll my own parser than find, > extract, and then port (to my own framework) the corresponding parts > of xlrd. > My own case was similar, in that I only needed the value data. The approach I took was to install xlrd and use it. Job done. > I needed to extract the content of each row, cell by cell, and build > data objects (in my framework) with the content of various cells. I > also needed to build an "exception file" containing malformed entries > that I could re-open with Excel after my code finished, so that the > bogus entries could be manually corrected. What I mean by "malformed" > entry is, for example, an address field that fails to correctly > geocode or comment fields with confused utf8/unicode contents. My > focus was on data content, as opposed to presentation. I needed to > crack the cells into things like "string", "boolean", "float", and so > on. > > Most importantly, I needed to do this one entry at a time -- I did > *not* want to load the entire spreadsheet at once. > My data files weren't that large (IIRC the largest spreadsheet was about 6MB), so I was quite happy to load the whole thing in memory, iterate over it and then write the results to the database as they were extracted. > I'm not saying that this couldn't be done with xlrd; only that I chose > to roll my own and had minimal difficulty doing so. > Given the constraints of your problem it seems like an intelligent approach. > I hope this helps! I'm sure it will. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list