[EMAIL PROTECTED] wrote: >> FWIW, it works here on 2.5.1 without errors or warnings. Ouput is: >> 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] >> 0.6.1 > > I guess it's a version issue then...
I say again: Don't guess. > > I forgot about sorted! Yes, that would make sense! > > Thanks for the input. > > > On Apr 2, 4:23 pm, [EMAIL PROTECTED] wrote: >> Still no luck: >> >> Traceback (most recent call last): >> File "C:\Python24\Lib\site-packages\pythonwin\pywin\framework >> \scriptutils.py", line 310, in RunScript >> exec codeObject in __main__.__dict__ >> File "C:\text analysis\pickle_test2.py", line 13, in ? >> cPickle.dump(Data_sheet, pickle_file, -1) >> PicklingError: Can't pickle <type 'module'>: attribute lookup >> __builtin__.module failed I didn't notice that the exception had changed from the original: "TypeError: can't pickle file objects" (with protocol=0) to: "TypeError: can't pickle module objects" (pickling an xlrd.Book object with protocol=-1) and now to: "PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed" (pickling an xlrd.Sheet object with protocol -1) I'm wondering if this is some unfortunate side effect of running the script in the pywin IDE ("exec codeObject in __main__.__dict__"). Can you reproduce the problem by running the script in the Command Prompt window? What version of pywin32 are you using? >> >> My code remains the same, except I added 'wb' and the -1 following >> your suggestions: >> >> import cPickle,xlrd, sys >> >> print sys.version >> print xlrd.__VERSION__ >> >> data_path = """C:\\test\\test.xls""" >> pickle_path = """C:\\test\\pickle.pickle""" >> >> book = xlrd.open_workbook(data_path) >> Data_sheet = book.sheet_by_index(0) >> >> pickle_file = open(pickle_path, 'wb')cPickle.dump(Data_sheet, pickle_file, >> -1) >> pickle_file.close() >> >> To begin with (I forgot to mention this before) I get this error: >> WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non- >> zero "WARNING" != "error". If that's the only message you get, ignore it; it means that your XLS file was created by the perl XLS-writing package or a copier thereof. >> >> I'm not sure what this means. >> >>> What do you describe as "simple manipulations"? Please describe your >>> computer, including how much memory it has. >> I have a 1.8Ghz HP dv6000 with 2Gb of ram, which should be speedy >> enough for my programming projects. However, when I try to print out >> the rows in the excel file, my computer gets very slow and choppy, >> which makes experimenting slow and frustrating. Just printing the rows is VERY UNLIKELY to cause this. Demonstrate this to yourself by using xlrd's supplied runxlrd script: command_prompt> c:\python24\scripts\runxlrd.py show yourfile.xls >> Maybe cPickle won't >> solve this problem at all! 99.9% chance, not "maybe". >> For this first part, I am trying to make >> ID numbers for the different permutation of categories, topics, and >> sub_topics. So I will have [book,non-fiction,biography],[book,non- >> fiction,history-general],[book,fiction,literature], etc.. >> so I want the combination of >> [book,non-fiction,biography] = 1 >> [book,non-fiction,history-general] = 2 >> [book,fiction,literature] = 3 >> etc... >> >> My code does this, except sort returns None, which is strange. list.sort() returns None by definition; it sorts the list object's contents in situ. > I just >> want an alphabetical sort of the first option, which sort should do >> automatically. When I do a test like>>>nest_list = [['bbc', 'cds'], ['jim', >> 'ex'],['abc', 'sd']] >>>>> nest_list.sort() >> [['abc', 'sd'], ['bbc', 'cds'], ['jim', 'ex']] >> It works fine, but not for my rows. Why are you sorting? >> >> Here's the code (unpickled/unsorted): >> import xlrd, pyExcelerator >> >> path_file = "C:\\text_analysis\\test.xls" >> book = xlrd.open_workbook(path_file) >> ProcFT_QC = book.sheet_by_index(0) >> log_path = "C:\\text_analysis\\ID_Log.log" >> logfile = open(log_path,'wb') >> >> set_rows = [] The test x in y where y is a sequence needs to compare with half of the existing items on average. You are doing that test N times. If the number of unique rows is U, it will do about N*U/4 comparisons. You said N is about 50,000. The changes below make y a set; consequentially x needs to be a tuple instead of a list. set_rows = set() >> rows = [] >> db = {} >> n=0 >> while n<ProcFT_QC.nrows: >> rows.append(ProcFT_QC.row_values(n, 6,9)) rows.append(tuple(ProcFT_QC.row_values(n, 6,9))) >> n+=1 >> print rows.sort() #Outputs None >> ID = 1 >> for row in rows: >> if row not in set_rows: >> set_rows.append(row) set_rows.add(row) >> db[ID] = row >> entry = str(ID) + '|' + str(row).strip('u[]') + '\r\n' Presuming your data is actually ASCII, you could save time and memory by converting it once as you extract it from the spreadsheet. entry = str(ID) + '|' + str(row).strip('u()') + '\r\n' >> logfile.write(entry) >> ID+=1 >> logfile.close() >> HTH, John -- http://mail.python.org/mailman/listinfo/python-list