On Dec 18, 11:52 pm, MRAB <goo...@mrabarnett.plus.com> wrote: > James Mills wrote: > > On Fri, Dec 19, 2008 at 8:18 AM, Martin Manns <mma...@gmx.net> wrote: > >> Hi: > > > Hi, > > >> I am writing a spreadsheet application in Python > > > What's wrong with pyspread ? > > > [ ... snip ... ] > > >> The dict that I tried out is of the type: > > >> {(1,2,3): "2323", (1,2,545): "2324234", ... } > > >> It is too slow for my application when it grows. One slicing operation > >> with list comprehensions takes about 1/2 s on my computer for 1E6 > >> elements. > > > Let me get this straight. > > It's taking 0.5s to slice your matrix > > of 1E7 (10000000.0 rows/columns) > > > Are you mad ? This is TEN Millions and you > > required it faster than 0.5s ? > > No, 1E6, or ONLY 1 million. :-)
what does Internet explorer 6 & 7 have to do with that ? ;-) > > > Am I missing something here ? > > Would a matrix of matrices work better, ie a supermatrix where each > element is either a submatrix of cells or None if the submatrix is > empty? How much memory it would save would depend on how sparse the > matrix is. sparse matrix efficiency is all about guessing the probability for a cell to be empty. My guess is that the probability for a cell (i,j) to be empty grows with norm(i,j) then, you should store your data in a list, and the indexes in another "list", then use a BTree like algorithm, or keep you data ordered by norm(i,j) growing. something like: data = [] # kernel =[] # full of couples (i,j) def add(i,j,value): data.append(value) # or put in the "right place" to keep it sorted kernel.append(i,j) # same comment def get(i,j): k = kernel_of(i,j) return data[k] def set(i,j,value): if is_probably_empty(): k = kernel_of(i,j) if k: data[k]= value else: add(i,j,value) else: add(i,j,value) where the 'kernel_of method" can take advantage of the sorted order (i*j or max(i,j) should be as good) to make a fast dichotomy. The set method is about finding if the cell is 'empty or not' and then either perform an indexof, or a add. to efficiently know if a cell is empty (the is_probably_empty method) or not, try to use the bloom filter using i*j%m as hash function (for instance) here arrays have the size of your actual data (maybe thousands of cell for a huge spreadsheet), that's MUCH better than the size of the maxN*maxM if my memory is correct, there is room in numpy for your own sparse matrix implementation. You should try it with your own sparse matrix implementation. -- Eric http://codeslash.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list