Re: fastest data structure for retrieving objects identified by (x,y) tuple?

2012-10-04 Thread Steven D'Aprano
On Thu, 04 Oct 2012 18:11:28 +0200, Thomas Rachel wrote:

> Am 04.10.2012 03:58 schrieb Steven D'Aprano:
>> alist = [[None]*2400 for i in range(2400)] from random import randrange
>> for i in range(1000):
>>  x = randrange(2400)
>>  y = randrange(2400)
>>  adict[(x, y)] = "something"
>>  alist[x][y] = "something"
> 
>> The actual sizes printed will depend on how sparse the matrices are,
>> but for the same above (approximately half full),
> 
> I wouldn't consider 1000 of 576 "half full"...

Doh! I obviously can't multiply...

I mean, well done, that was a deliberate test to see who was paying 
attention, and you passed! 

:)


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: fastest data structure for retrieving objects identified by (x, y) tuple?

2012-10-04 Thread Thomas Rachel

Am 04.10.2012 03:58 schrieb Steven D'Aprano:

alist = [[None]*2400 for i in range(2400)]
from random import randrange
for i in range(1000):
 x = randrange(2400)
 y = randrange(2400)
 adict[(x, y)] = "something"
 alist[x][y] = "something"



The actual sizes printed will depend on how sparse the matrices are, but
for the same above (approximately half full),


I wouldn't consider 1000 of 576 "half full"...


Thomas
--
http://mail.python.org/mailman/listinfo/python-list


Re: fastest data structure for retrieving objects identified by (x, y) tuple?

2012-10-04 Thread Steven D'Aprano
On Thu, 04 Oct 2012 08:21:13 -0400, Benjamin Jessup wrote:

> On 10/4/2012 12:20 AM, python-list-requ...@python.org wrote:
>> How do you know that?
>>
>> No offence, but if you can't even work out whether lookups in a dict or
>> a list are faster, I can't imagine why you think you can intuit what
>> the fastest way to retrieve the nearest neighbours would be.
> 
> Whats wrong with the test below?
[snip code]


I don't know. Is this a trick question? Is the answer, "nothing is wrong"?

It doesn't seem to be very useful code, but since I don't know what you 
think you are testing, I can't tell you whether you are doing it wrong or 
not.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: fastest data structure for retrieving objects identified by (x, y) tuple?

2012-10-04 Thread Benjamin Jessup

On 10/4/2012 12:20 AM, python-list-requ...@python.org wrote:

How do you know that?

No offence, but if you can't even work out whether lookups in a dict or a
list are faster, I can't imagine why you think you can intuit what the
fastest way to retrieve the nearest neighbours would be.


Whats wrong with the test below?

# randomly select matrix coordinates to look-up
from random import randrange
test_coords = []
for i in range(1000):
x = randrange(2400);  y = randrange(2400); test_coords.append((x, y))

# build objects
class Object():pass
obj1 = Object(); obj2 = Object(); obj1.up = obj2

# build some test code
from timeit import Timer
setup = "from __main__ import test_coords, obj1, obj2"
t = Timer("for p in test_coords: obj = obj1.up", setup)

# run the test code
print(min(t.repeat(number=1, repeat=7)))
import platform
print(platform.python_version())

On my system, I get:
0.719622326348
2.7.1


--
http://mail.python.org/mailman/listinfo/python-list


Re: fastest data structure for retrieving objects identified by (x, y) tuple?

2012-10-04 Thread Oscar Benjamin
On Oct 4, 2012 3:02 AM, "Steven D'Aprano" <
steve+comp.lang.pyt...@pearwood.info> wrote:
> # populate a random matrix using both dict and list
> adict = {}
> alist = [[None]*2400 for i in range(2400)]
> from random import randrange
> for i in range(1000):
> x = randrange(2400)
> y = randrange(2400)
> adict[(x, y)] = "something"
> alist[x][y] = "something"
>
> import sys
> print(sys.getsizeof(adict))
> print(sys.getsizeof(alist) + sum(sys.getsizeof(L) for L in alist))
>
>
> The actual sizes printed will depend on how sparse the matrices are, but
> for the same above (approximately half full), using Python 2.7, the
> figures I get are:
>
> adict: 24712
> alist: 23127324

I make it 0.02% full. If it was half full the dict might not have a memory
advantage.

Oscar
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: fastest data structure for retrieving objects identified by (x,y) tuple?

2012-10-03 Thread Steven D'Aprano
On Thu, 04 Oct 2012 01:58:16 +, Steven D'Aprano wrote:

> adict: 24712
> alist: 23127324
[...]
> So in this situation, a list of lists uses about 100 times
> more memory than a dict, but look-ups are about 6% faster.

Correction: about 1000 times more memory. Sorry for the typo.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: fastest data structure for retrieving objects identified by (x,y) tuple?

2012-10-03 Thread Steven D'Aprano
On Wed, 03 Oct 2012 18:30:24 -0400, Benjamin Jessup wrote:

> I have a group of objects identified by unique (x,y) pairs and I want to
> find out an object's "neighbors" in a matrix of size 2400 x 2400.
[...]
> There is either a neighbor, or a null value. I always know the (x,y)
> pair to check the neighbors of, so is doing,
>  >> obj = grid[x][y] #lists, doesn't scale with num of objects
> or,
>  >> obj = grid.get((x,y),None) #dictionary, scales with num of objects
> the fastest? I can't seem to find a conclusion by testing each alone,
> then in the full environment. Is it that, depending on the number of
> objects, each has an advantage?

Almost certainly not. Since you don't show how you test each, I have no 
idea why you can't find a conclusion.

Let's start off with the question you don't ask: which uses less memory? 
The answer is, the dict solution by far. Here is a test:

# populate a random matrix using both dict and list
adict = {}
alist = [[None]*2400 for i in range(2400)]
from random import randrange
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
adict[(x, y)] = "something"
alist[x][y] = "something"

import sys
print(sys.getsizeof(adict))
print(sys.getsizeof(alist) + sum(sys.getsizeof(L) for L in alist))


The actual sizes printed will depend on how sparse the matrices are, but 
for the same above (approximately half full), using Python 2.7, the 
figures I get are:

adict: 24712
alist: 23127324


Now let's test the speed:

# randomly select matrix coordinates to look-up
test_coords = []
for i in range(1000):
x = randrange(2400)
y = randrange(2400)
test_coords.append((x, y))

# build some test code
from timeit import Timer
setup = "from __main__ import adict, alist, test_coords"
t1 = Timer("for p in test_coords: obj = adict.get(p)", setup)
t2 = Timer("for p in test_coords: obj = alist[p[0]][p[1]]", setup)

# run the test code
print(min(t1.repeat(number=1, repeat=7)))
print(min(t2.repeat(number=1, repeat=7)))


Again, on my system using Python 2.7, I get:
3.13823986053
2.97539305687

which shows that the two are very close, but the list solution is about 
6% faster. So in this situation, a list of lists uses about 100 times 
more memory than a dict, but look-ups are about 6% faster.

I would be very surprised if the timing results depended on the number of 
objects in the matrices.

In case you are not familiar with timeit, let me explain what I have done:

* I pre-select 1000 random coordinates.
* I write some test code inside a Timer object that look up each of 
  those coordinates, plus some setup code.
* timeit runs the setup code once.
* Then it runs my test code 1 times as a single trial, timing
  it as accurately as possible.
* It does seven trials, and I report the best of the seven.

The time printed is measured in seconds. In this case, I get 3 seconds 
per trial, or 3e-7 seconds = 0.3 microseconds per lookup.


> I know the fastest way to retrieve them would be to have them store
> pointers to their neighbors, then use those for retrieval.

How do you know that?

No offence, but if you can't even work out whether lookups in a dict or a 
list are faster, I can't imagine why you think you can intuit what the 
fastest way to retrieve the nearest neighbours would be.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: fastest data structure for retrieving objects identified by (x, y) tuple?

2012-10-03 Thread Joshua Landau
On 3 October 2012 23:30, Benjamin Jessup  wrote:

> I have a group of objects identified by unique (x,y) pairs and I want to
> find out an object's "neighbors" in a matrix of size 2400 x 2400.
>#
>#obj#   #   #
>#
>#   #   #obj#  3 x 3 Example
>#
>#   #   #   #
>#
> There is either a neighbor, or a null value. I always know the (x,y) pair
> to check the neighbors of, so is doing,
> >> obj = grid[x][y] #lists, doesn't scale with num of objects
> or,
> >> obj = grid.get((x,y),None) #dictionary, scales with num of objects
> the fastest? I can't seem to find a conclusion by testing each alone, then
> in the full environment. Is it that, depending on the number of objects,
> each has an advantage?
>
> I know the fastest way to retrieve them would be to have them store
> pointers to their neighbors, then use those for retrieval. When large
> numbers of objects are changing their (x,y) pairs, rebuilding the pointers
> is too slow


You really are asking the wrong question.

If most of the data is None, then a sparse matrix is best. Otherwise, lists
make the most sense.
*However, *what you really want is... a matrix. Look for a sparse matrix
type (
http://stackoverflow.com/questions/1053928/python-numpy-very-large-matrices)
if you need sparse, and otherwise Numpy's matrix should do fine.

The thing is this:
"I can't seem to find a conclusion by testing each alone, then in the full
environment. Is it that, depending on the number of objects, each has an
advantage?"

If you can't tell, don't bother. Dictionaries will have a memory advantage
with sparse matrices, lists in the other cases. That's the important part,
as look-up is fast for both*. If you need faster than these builtins, use
Numpy or SciPy.

If you really need optimization help, profile and ask the *right* question
to this list.

* To actually answer the question:
They both have O(1) look-up time, although dictionaries have a theoretical *
worst* case of O(N). Being simpler, it's faster to index a list. However,
you're doing that twice, so it's probably around even.
-- 
http://mail.python.org/mailman/listinfo/python-list


fastest data structure for retrieving objects identified by (x,y) tuple?

2012-10-03 Thread Benjamin Jessup
I have a group of objects identified by unique (x,y) pairs and I want to 
find out an object's "neighbors" in a matrix of size 2400 x 2400.

   #
   #obj#   #   #
   #
   #   #   #obj#  3 x 3 Example
   #
   #   #   #   #
   #
There is either a neighbor, or a null value. I always know the (x,y) 
pair to check the neighbors of, so is doing,

>> obj = grid[x][y] #lists, doesn't scale with num of objects
or,
>> obj = grid.get((x,y),None) #dictionary, scales with num of objects
the fastest? I can't seem to find a conclusion by testing each alone, 
then in the full environment. Is it that, depending on the number of 
objects, each has an advantage?


I know the fastest way to retrieve them would be to have them store 
pointers to their neighbors, then use those for retrieval. When large 
numbers of objects are changing their (x,y) pairs, rebuilding the 
pointers is too slow.

--
http://mail.python.org/mailman/listinfo/python-list