Re: [Tutor] Conditional attribute access / key access
The sizes given are in bytes. So 200,000 instances of this class, plus the list to hold them, would take approximately 34 megabytes. An entry level PC these days has 1000 megabytes of memory. "Huge"? Not even close. The items hold a lot of metadata, which I didn't provide in my example. Depending on the source up to 30 addional attributes per item, mainly strings. And I will have several sources. So far there're no restrictions about how to model the items. They can be dicts, objects of a custom class (preferable with __slots__) or namedTuple. Those items have references to each other using ids. That approach sounds slow and ponderous to me. Why don't you just give items direct references to each other, instead of indirect using ids? Unfortunately I have to able to use a relational database later on. Currently I'm using a document database for developement. That's where the ids are coming from and you're right: They are a pain ... ;-) If you *need* indirection, say because you are keeping the data in a database and you want to only lazily load it when needed, rather than all at once, then the right approach is probably a proxy object: class PartProxy(object): def __init__(self, database_id): self._info = None self.database_id = database_id @property def info(self): if self._info is None: self._info = get_from_database(self.database_id) return self._info That's it! Excactly what I was looking for! That eases the id-pain. Thanks!! Jan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Conditional attribute access / key access
On Tue, 31 Aug 2010 12:44:08 am Knacktus wrote: > Hey everyone, > > I have a huge number of data items coming from a database. Huge? Later in this thread, you mentioned 200,000 items overall. That might be "huge" to you, but it isn't to Python. Here's an example: class K(object): def __init__(self): self.info = {"id": id(self), "name": "root " + str(id(self)), "children_ids": [2*id(self), 3*id(self)+1]} And the size: >>> k = K() >>> sys.getsizeof(k) 28 >>> sys.getsizeof(k.info) 136 >>> L = [K() for _ in xrange(20)] >>> sys.getsizeof(L) 835896 The sizes given are in bytes. So 200,000 instances of this class, plus the list to hold them, would take approximately 34 megabytes. An entry level PC these days has 1000 megabytes of memory. "Huge"? Not even close. Optimizing with __slots__ is premature. Perhaps if you had 1000 times that many instances, then it might be worth while. > So far > there're no restrictions about how to model the items. They can be > dicts, objects of a custom class (preferable with __slots__) or > namedTuple. > > Those items have references to each other using ids. That approach sounds slow and ponderous to me. Why don't you just give items direct references to each other, instead of indirect using ids? I presume you're doing something like this: ids = {0: None} # Map IDs to objects. a = Part(0) ids[1] = a b = Part(1) # b is linked to a via its ID 1. ids[2] = b c = Part(2) # c is linked to b via its ID 2. ids[3] = c (only presumably less painfully). If that's what you're doing, you should dump the ids and just do this: a = Part(None) b = Part(a) c = Part(b) Storing references to objects in Python is cheap -- it's only a pointer. Using indirection via an ID you manage yourself is a pessimation, not an optimization: it requires more code, slower speed, and more memory too (because the integer IDs themselves are pointers to 12 byte objects, not 4 byte ints). If you *need* indirection, say because you are keeping the data in a database and you want to only lazily load it when needed, rather than all at once, then the right approach is probably a proxy object: class PartProxy(object): def __init__(self, database_id): self._info = None self.database_id = database_id @property def info(self): if self._info is None: self._info = get_from_database(self.database_id) return self._info -- Steven D'Aprano ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Conditional attribute access / key access
Am 30.08.2010 17:53, schrieb Francesco Loffredo: Two questions and one doubt for you: 1- How many "generations" do you want to keep in a single item (call it dictionary or list, or record, whatever)? I mean, what if some children have children too, and some of those have more children, etc ? There's always one level (generation) of children in an item. An item can have zero or more direct children. And a lot of grandchildren and grandgrandchildren etc. The item-structure represent an assembly hierarchy of the parts of a car. So overall the structure can be up to about 20 levels "deep" and consist of up to 20 items overall, where the application needs to handle several structures. 2- Are you SURE that there are no circular references in your database? In your example, what if item_3 was item_3 = {"id": 3, "name": "child_2", "children_ids": [6, 1, 8]}? Is't it possible that those recursion limit problems you had could come from some circular reference in your data? That's a good hint. But the recursion limit doesn't come from that (the test data actually had no children. I used a single instance of my dict.) d- If the number of data items is really huge, are you sure that you want to keep the whole family in memory at the same time? It depends on the answer you gave to my question #1, of course, but if retrieving an item from your database is quick as it should be, you could use a query to resolve the references on demand, and you wouldn't need a special structure to hold "the rest of the family". If the retrieval is slow or difficult, then the creation of your structure could take a significant amount of time. One thing is, that I have to do some additional calculations when resolving the structure. The items will get some kind of labels/conditions and versions, further, when resolving the structure a set of rules for those conditions is given. At my first shot I'll have to do those calculations in the Python code (even if it would be very wicked to do stuff like that with SQL). So, I will always have a large number of items in memory, as I don't want to call the database for each structure-level I want to expand. Also, I'm using a pyqt-treeview (AbstractItemModel) for a client-site gui. For this purpose I need to create an additional structure, as in the original data items can have more than one parent, which is not permitted in the model for the treeview. The whole idea of replacing the id-references to object-references is to enhance performance and make the application code easier. Thanks for the feedback so far. Hope this helps, Francesco Nessun virus nel messaggio in uscita. Controllato da AVG - www.avg.com Versione: 9.0.851 / Database dei virus: 271.1.1/3100 - Data di rilascio: 08/29/10 08:34:00 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Conditional attribute access / key access
On 30/08/2010 16.44, Knacktus wrote: Hey everyone, I have a huge number of data items coming from a database. So far there're no restrictions about how to model the items. They can be dicts, objects of a custom class (preferable with __slots__) or namedTuple. Those items have references to each other using ids. Fresh from the database the items look like this (using dicts as examples): item_1 = {"id": 1, "name": "root", "children_ids": [2, 3]} item_2 = {"id": 2, "name": "child_1", "children_ids": [4]} item_3 = {"id": 3, "name": "child_2", "children_ids": [6, 7, 8]} Now I'd like to resolve the references on demand. Two questions and one doubt for you: 1- How many "generations" do you want to keep in a single item (call it dictionary or list, or record, whatever)? I mean, what if some children have children too, and some of those have more children, etc ? 2- Are you SURE that there are no circular references in your database? In your example, what if item_3 was item_3 = {"id": 3, "name": "child_2", "children_ids": [6, 1, 8]}? Is't it possible that those recursion limit problems you had could come from some circular reference in your data? d- If the number of data items is really huge, are you sure that you want to keep the whole family in memory at the same time? It depends on the answer you gave to my question #1, of course, but if retrieving an item from your database is quick as it should be, you could use a query to resolve the references on demand, and you wouldn't need a special structure to hold "the rest of the family". If the retrieval is slow or difficult, then the creation of your structure could take a significant amount of time. Hope this helps, Francesco Nessun virus nel messaggio in uscita. Controllato da AVG - www.avg.com Versione: 9.0.851 / Database dei virus: 271.1.1/3100 - Data di rilascio: 08/29/10 08:34:00 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor