Re: [Tutor] Conditional attribute access / key access

2010-08-31 Thread Knacktus



The sizes given are in bytes. So 200,000 instances of this class, plus
the list to hold them, would take approximately 34 megabytes. An entry
level PC these days has 1000 megabytes of memory. "Huge"? Not even
close.


The items hold a lot of metadata, which I didn't provide in my example. 
Depending on the source up to 30 addional attributes per item, mainly 
strings. And I will have several sources.



So far
there're no restrictions about how to model the items. They can be
dicts, objects of a custom class (preferable with __slots__) or
namedTuple.

Those items have references to each other using ids.


That approach sounds slow and ponderous to me. Why don't you just give
items direct references to each other, instead of indirect using ids?



Unfortunately I have to able to use a relational database later on. 
Currently I'm using a document database for developement. That's where 
the ids are coming from and you're right: They are a pain ... ;-)




If you *need* indirection, say because you are keeping the data in a
database and you want to only lazily load it when needed, rather than
all at once, then the right approach is probably a proxy object:

class PartProxy(object):
 def __init__(self, database_id):
 self._info = None
 self.database_id = database_id
 @property
 def info(self):
 if self._info is None:
 self._info = get_from_database(self.database_id)
 return self._info



That's it! Excactly what I was looking for! That eases the id-pain. Thanks!!

Jan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Conditional attribute access / key access

2010-08-31 Thread Steven D'Aprano
On Tue, 31 Aug 2010 12:44:08 am Knacktus wrote:
> Hey everyone,
>
> I have a huge number of data items coming from a database. 

Huge?

Later in this thread, you mentioned 200,000 items overall. That might 
be "huge" to you, but it isn't to Python. Here's an example:

class K(object):
def __init__(self):
self.info = {"id": id(self),
"name": "root " + str(id(self)), 
"children_ids": [2*id(self), 3*id(self)+1]}


And the size:

>>> k = K()
>>> sys.getsizeof(k)
28
>>> sys.getsizeof(k.info)
136
>>> L = [K() for _ in xrange(20)]
>>> sys.getsizeof(L)
835896

The sizes given are in bytes. So 200,000 instances of this class, plus 
the list to hold them, would take approximately 34 megabytes. An entry 
level PC these days has 1000 megabytes of memory. "Huge"? Not even 
close.

Optimizing with __slots__ is premature. Perhaps if you had 1000 times 
that many instances, then it might be worth while.



> So far 
> there're no restrictions about how to model the items. They can be
> dicts, objects of a custom class (preferable with __slots__) or
> namedTuple.
>
> Those items have references to each other using ids.

That approach sounds slow and ponderous to me. Why don't you just give 
items direct references to each other, instead of indirect using ids?

I presume you're doing something like this:

ids = {0: None}  # Map IDs to objects.
a = Part(0)
ids[1] = a
b = Part(1)  # b is linked to a via its ID 1.
ids[2] = b
c = Part(2)  # c is linked to b via its ID 2.
ids[3] = c

(only presumably less painfully).


If that's what you're doing, you should dump the ids and just do this:

a = Part(None)
b = Part(a)
c = Part(b)

Storing references to objects in Python is cheap -- it's only a pointer. 
Using indirection via an ID you manage yourself is a pessimation, not 
an optimization: it requires more code, slower speed, and more memory 
too (because the integer IDs themselves are pointers to 12 byte 
objects, not 4 byte ints).

If you *need* indirection, say because you are keeping the data in a 
database and you want to only lazily load it when needed, rather than 
all at once, then the right approach is probably a proxy object:

class PartProxy(object):
def __init__(self, database_id):
self._info = None
self.database_id = database_id
@property
def info(self):
if self._info is None:
self._info = get_from_database(self.database_id)
return self._info




-- 
Steven D'Aprano
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Conditional attribute access / key access

2010-08-30 Thread Knacktus

Am 30.08.2010 17:53, schrieb Francesco Loffredo:


Two questions and one doubt for you:
1- How many "generations" do you want to keep in a single item (call it
dictionary or list, or record, whatever)? I mean, what if some children
have children too, and some of those have more children, etc ?
There's always one level (generation) of children in an item. An item 
can have zero or more direct children. And a lot of grandchildren and 
grandgrandchildren etc. The item-structure represent an assembly 
hierarchy of the parts of a car. So overall the structure can be up to 
about 20 levels "deep" and consist of up to 20 items overall, where 
the application needs to handle several structures.


2- Are you SURE that there are no circular references in your database?
In your example, what if item_3 was
item_3 = {"id": 3, "name": "child_2", "children_ids": [6, 1, 8]}? Is't
it possible that those recursion limit problems you had could come from
some circular reference in your data?
That's a good hint. But the recursion limit doesn't come from that (the 
test data actually had no children. I used a single instance of my dict.)


d- If the number of data items is really huge, are you sure that you
want to keep the whole family in memory at the same time? It depends on
the answer you gave to my question #1, of course, but if retrieving an
item from your database is quick as it should be, you could use a query
to resolve the references on demand, and you wouldn't need a special
structure to hold "the rest of the family". If the retrieval is slow or
difficult, then the creation of your structure could take a significant
amount of time.
One thing is, that I have to do some additional calculations when 
resolving the structure. The items will get some kind of 
labels/conditions and versions, further, when resolving the structure a 
set of rules for those conditions is given. At my first shot I'll have 
to do those calculations in the Python code (even if it would be very 
wicked to do stuff like that with SQL). So, I will always have a large 
number of items in memory, as I don't want to call the database for each 
structure-level I want to expand. Also, I'm using a pyqt-treeview 
(AbstractItemModel) for a client-site gui. For this purpose I need to 
create an additional structure, as in the original data items can have 
more than one parent, which is not permitted in the model for the treeview.
The whole idea of replacing the id-references to object-references is to 
enhance performance and make the application code easier.


Thanks for the feedback so far.


Hope this helps,
Francesco




Nessun virus nel messaggio in uscita.
Controllato da AVG - www.avg.com
Versione: 9.0.851 / Database dei virus: 271.1.1/3100 -  Data di rilascio: 
08/29/10 08:34:00




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Conditional attribute access / key access

2010-08-30 Thread Francesco Loffredo

On 30/08/2010 16.44, Knacktus wrote:

Hey everyone,

I have a huge number of data items coming from a database. So far
there're no restrictions about how to model the items. They can be
dicts, objects of a custom class (preferable with __slots__) or namedTuple.

Those items have references to each other using ids. Fresh from the
database the items look like this (using dicts as examples):

item_1 = {"id": 1, "name": "root", "children_ids": [2, 3]}
item_2 = {"id": 2, "name": "child_1", "children_ids": [4]}
item_3 = {"id": 3, "name": "child_2", "children_ids": [6, 7, 8]}

Now I'd like to resolve the references on demand.


Two questions and one doubt for you:
1- How many "generations" do you want to keep in a single item (call it 
dictionary or list, or record, whatever)? I mean, what if some children 
have children too, and some of those have more children, etc ?


2- Are you SURE that there are no circular references in your database? 
In your example, what if item_3 was
item_3 = {"id": 3, "name": "child_2", "children_ids": [6, 1, 8]}? Is't 
it possible that those recursion limit problems you had could come from 
some circular reference in your data?


d- If the number of data items is really huge, are you sure that you 
want to keep the whole family in memory at the same time? It depends on 
the answer you gave to my question #1, of course, but if retrieving an 
item from your database is quick as it should be, you could use a query 
to resolve the references on demand, and you wouldn't need a special 
structure to hold "the rest of the family". If the retrieval is slow or 
difficult, then the creation of your structure could take a significant 
amount of time.


Hope this helps,
Francesco
Nessun virus nel messaggio in uscita.
Controllato da AVG - www.avg.com
Versione: 9.0.851 / Database dei virus: 271.1.1/3100 -  Data di rilascio: 
08/29/10 08:34:00
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor