Re: [Tutor] is this use or abuse of __getitem__ ?
On Fri, Sep 14, 2012 at 2:33 PM, Albert-Jan Roskam fo...@yahoo.com wrote: On 14/09/12 22:16, Albert-Jan Roskam wrote: Is it recommended to define the geitem() function inside the __getitem__() method? I was thinking I could also define a _getitem() private method. def getitem(key): retcode1 = self.iomodule.SeekNextCase(self.fh, ctypes.c_long(int(key))) I wouldn't do this since it incurs the cost of a repeated function call. A slice could involve thousands of such calls. Maybe use a boolean variable like is_slice. Then use a for loop to build the records list (maybe only 1 item). If is_slice, return records, else return records[0]. if isinstance(key, slice): records = [getitem(i) for i in range(*key.indices(self.nCases))] return records elif hasattr(key, __int__): # isinstance(key, (int, float)): if abs(key) (self.nCases - 1): raise IndexError else: key = self.nCases + key if key 0 else key record = getitem(key) return record else: raise TypeError I agree with Steven's reasoning that it doesn't make sense to support floating point indexes. Python 2.6+ has the __index__ special method. int and long have this method. float, Decimal,and Fraction do not have it. It lets you support any user-defined class that can be used as an index. For example: class MyInt(object): ... def __index__(self): ... return 5 slice(MyInt(), MyInt(), MyInt()).indices(10) (5, 5, 5) operator.index() is the corresponding function. It raises TypeError if __index__ isn't supported. But watch out because you're using ctypes.c_long. It doesn't do any range checking. It just silently wraps around modulo the size of a long on your platform: c_long(2**32-1), c_long(2**32), c_long(2**32+1) (c_long(-1), c_long(0), c_long(1)) Calling int(key) or index(key) is no help because it will silently return a Python long (big int). You need to do range checking on the upper bound and raise a ValueError. For example: from operator import index # calls obj.__index__() is_slice = isinstance(key, slice) if is_slice: start, stop, step = key.indices(self.nCases) # may raise TypeError else: start = index(self.nCases + key if key 0 else key) # may raise TypeError stop = start + 1 step = 1 if stop 2 ** (ctypes.sizeof(ctypes.c_long) * 8 - 1): raise ValueError('useful message') records = [] for i in range(start, stop, step): retcode1 = self.iomodule.SeekNextCase(self.fh, ctypes.c_long(i)) self.caseBuffer, self.caseBufferPtr = self.getCaseBuffer() retcode2 = self.iomodule.WholeCaseIn(self.fh, self.caseBufferPtr) record = struct.unpack(self.structFmt, self.caseBuffer.raw) if any([retcode1, retcode2]): raise RuntimeError(Error retrieving record %d [%s, %s] % (i, retcodes[retcode1], retcodes[retcode2])) records.append(record) if not is_slice: records = records[0] return records ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] is this use or abuse of __getitem__ ?
On Sat, Sep 15, 2012 at 4:43 AM, eryksun eryk...@gmail.com wrote: else: start = index(self.nCases + key if key 0 else key) # may raise TypeError stop = start + 1 step = 1 Gmail is such a pain sometimes. I should have called index first anyway: key = index(key) # may raise TypeError start = key + self.nCases if key 0 else key stop = start + 1 step = 1 records = [] for i in range(start, stop, step): ... records.append(record) You can boost the performance here a bit by caching the append method. This avoids a LOAD_ATTR operation on each iteration: records = [] append = records.append for i in range(start, stop, step): ... append(record) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] is this use or abuse of __getitem__ ?
On Sat, Sep 15, 2012 at 4:43 AM, eryksun eryk...@gmail.com wrote: else: start = index(self.nCases + key if key 0 else key) # may raise TypeError stop = start + 1 step = 1 Gmail is such a pain sometimes. I should have called index first anyway: key = index(key) # may raise TypeError start = key + self.nCases if key 0 else key stop = start + 1 step = 1 Thanks, I hadn't noticed this yet. I am refactoring some of the rest of my code and I hadn't run anything yet. My code has two methods that return record(s): an iterator (__getitem__) and a generator (readFile, which is also called by __enter__). Shouldn't I also take the possibility of a MemoryError into account when the caller does something like data[:10**8]? It may no longer fit into memory, esp. when the dataset is also wide. records = [] for i in range(start, stop, step): ... records.append(record) You can boost the performance here a bit by caching the append method. This avoids a LOAD_ATTR operation on each iteration: records = [] append = records.append for i in range(start, stop, step): ... append(record) I knew that trick from http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Avoiding_dots... but I didn't know about LOAD_ATTR. Is a list comprehension still faster than this? Does it also mean that e.g. from ctypes import * (-- c_long()) is faster than import ctypes (-- ctypes.c_long()). I am now putting as much as possible in __init__. I don't like the first way of importing at all. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] is this use or abuse of __getitem__ ?
On Sat, Sep 15, 2012 at 10:18 AM, Albert-Jan Roskam fo...@yahoo.com wrote: Thanks, I hadn't noticed this yet. I am refactoring some of the rest of my code and I hadn't run anything yet. My code has two methods that return record(s): an iterator (__getitem__) and a generator (readFile, which is also called by __enter__). Shouldn't I also take the possibility of a MemoryError into account when the caller does something like data[:10**8]? It may no longer fit into memory, esp. when the dataset is also wide. The issue with c_long isn't a problem for a slice since key.indices(self.nCases) limits the upper bound. For the individual index you had it right the first time by raising IndexError before it even gets to the c_long conversion. I'm sorry for wasting your time on a non-problem. However, your test there is a bit off. A negative index can be -nCases since counting from the end starts at -1. If you first do the ternary check to add the offset to a negative index, afterward you can raise an IndexError if not 0 = value nCases. As to MemoryError, dealing with gigabytes of data in main memory is not a problem I've come up against in practice. You might still want a reasonable upper bound for slices. Often when the process runs out of memory it won't even see a MemoryError. The OS simply kills it. On the other hand, while bugs like a c_long wrapping around need to be caught to prevent silent corruption of data, there's nothing at all silent about crashing the process. It's up to you how much you want to micromanage the situation. You might want to check out psutil as a cross-platform way to monitor the process memory usage: http://code.google.com/p/psutil If you're also supporting the iterator protocol with the __iter__ method, then I think a helper _items(start, stop, step) generator function would be a good idea. Here's an updated example (not tested however; it's just a suggestion): import operator def _items(self, start=0, stop=None, step=1): if stop is None: stop = self.nCases for i in range(start, stop, step): retcode1 = self.iomodule.SeekNextCase(self.fh, ctypes.c_long(i)) self.caseBuffer, self.caseBufferPtr = self.getCaseBuffer() retcode2 = self.iomodule.WholeCaseIn(self.fh, self.caseBufferPtr) record = struct.unpack(self.structFmt, self.caseBuffer.raw) if any([retcode1, retcode2]): raise RuntimeError(Error retrieving record %d [%s, %s] % (i, retcodes[retcode1], retcodes[retcode2])) yield record def __iter__(self): return self._items() def __getitem__(self, key): is_slice = isinstance(key, slice) if is_slice: start, stop, step = key.indices(self.nCases) else: key = operator.index(key) start = key + self.nCases if key 0 else key if not 0 = start self.nCases: raise IndexError stop = start + 1 step = 1 records = self._items(start, stop, step) if is_slice: return list(records) return next(records) but I didn't know about LOAD_ATTR. That's the bytecode operation to fetch an attribute. Whether or not bypassing it will provide a significant speedup depends on what else you're doing in the loop. If the the single LOAD_ATTR is only a small fraction of the total processing time, or you're not looping thousands of times, then this little change is insignificant. Is a list comprehension still faster than this? I think list comprehensions or generator expressions are best if the evaluated expression isn't too complex and uses built-in types and functions. I won't typically write a function just to use a list comprehension for a single statement. Compared to a regular for loop (especially if append is cached in a fast local), the function call overhead makes it a wash or worse, even given the comprehension's efficiency at building the list. If the main work of the loop is the most significant factor, then the choice of for loop vs list comprehension doesn't matter much with regard to performance, but I still think it's simpler to just use a regular for loop. You can also write a generator function if you need to reuse an iteration in multiple statements. Does it also mean that e.g. from ctypes import * (-- c_long()) is faster than import ctypes (-- ctypes.c_long()). I am now putting as much as possible in __init__. I don't like the first way of importing at all. It's not a good idea to pollute your namespace with import * statements. In a function, you can cache an attribute locally if doing so will provide a significant speedup. Or you can use a default argument like this: def f(x, c_long=ctypes.c_long): return c_long(x) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options:
[Tutor] is this use or abuse of __getitem__ ?
Hi, I defined a __getitem__ special method in a class that reads a binary data file using a C library. The docstring should clarify the purpose of the method. This works exactly as I intended it, however, the key argument is actually used as an index (it also raises an IndexError when key is greater than the number of records in the file). Am I abusing the __getitem__ method, or is this just a creative way of using it? # Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32 def __getitem__(self, key): This function reports the record of case number key. For example: firstRecord = FileReader(fileName)[0] if not isinstance(key, (int, float)): raise TypeError if abs(key) self.nCases: raise IndexError retcode1 = self.iomodule.SeekNextCase(self.fh, ctypes.c_long(int(key))) self.caseBuffer, self.caseBufferPtr = self.getCaseBuffer() retcode2 = self.iomodule.WholeCaseIn(self.fh, self.caseBufferPtr) record = struct.unpack(self.structFmt, self.caseBuffer.raw) if any([retcode1, retcode2]): raise RuntimeError, Error retrieving record %d [%s, %s] % \ (key, retcodes[retcode1], retcodes[retcode2]) return record Regards, Albert-Jan ~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] is this use or abuse of __getitem__ ?
On Fri, Sep 14, 2012 at 8:16 AM, Albert-Jan Roskam fo...@yahoo.com wrote: Am I abusing the __getitem__ method, or is this just a creative way of using it? No, you're using it the normal way. The item to get can be an index, a key, or even a slice. http://docs.python.org/reference/datamodel.html#object.__getitem__ if not isinstance(key, (int, float)): raise TypeError Instead you could raise a TypeError if not hasattr(key, '__int__') since later you call int(key). if abs(key) self.nCases: raise IndexError You might also want to support slicing. Here's an example: http://stackoverflow.com/a/2936876/205580 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] is this use or abuse of __getitem__ ?
On 14/09/12 22:16, Albert-Jan Roskam wrote: Hi, I defined a __getitem__ special method in a class that reads a binary data file using a C library. The docstring should clarify the purpose of the method. This works exactly as I intended it, however, the key argument is actually used as an index (it also raises an IndexError whenkey is greater than the number of records in the file). Am I abusing the __getitem__ method, or is this just a creative way of using it? No, that's exactly what __getitem__ is for. It does double-duty for key-lookup in mappings (dict[key]) and index-lookup in sequences (list[index]). You can also support ranges of indexes by accepting a slice argument. Another comment below: # Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32 def __getitem__(self, key): This function reports the record of case numberkey. For example: firstRecord = FileReader(fileName)[0] if not isinstance(key, (int, float)): raise TypeError Floats? Do you actually have have case number (for example) 0.14285714285714285 ? For this case, I think it is reasonable to insist on exactly an int, and nothing else (except possibly a slice object, to support for example obj[2:15]). -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] is this use or abuse of __getitem__ ?
On 14/09/12 22:16, Albert-Jan Roskam wrote: Hi, I defined a __getitem__ special method in a class that reads a binary data file using a C library. The docstring should clarify the purpose of the method. This works exactly as I intended it, however, the key argument is actually used as an index (it also raises an IndexError whenkey is greater than the number of records in the file). Am I abusing the __getitem__ method, or is this just a creative way of using it? No, that's exactly what __getitem__ is for. It does double-duty for key-lookup in mappings (dict[key]) and index-lookup in sequences (list[index]). You can also support ranges of indexes by accepting a slice argument. COOL! I was already wondering how this could be implemented. Dive into Python is pretty exhaustive wrt special methods, but I don't think they mentioned using the slice class. Below is how I did it. Is it recommended to define the geitem() function inside the __getitem__() method? I was thinking I could also define a _getitem() private method. Hmmm, maybe getitem() is redefined over and over again the way I did it now? def __getitem__(self, key): This function reports the record of case number key. For example: firstRecord = SavReader(savFileName)[0] def getitem(key): retcode1 = self.iomodule.SeekNextCase(self.fh, ctypes.c_long(int(key))) self.caseBuffer, self.caseBufferPtr = self.getCaseBuffer() retcode2 = self.iomodule.WholeCaseIn(self.fh, self.caseBufferPtr) record = struct.unpack(self.structFmt, self.caseBuffer.raw) if any([retcode1, retcode2]): raise RuntimeError, Error retrieving record %d [%s, %s] % \ (key, retcodes[retcode1], retcodes[retcode2]) return record if isinstance(key, slice): records = [getitem(i) for i in range(*key.indices(self.nCases))] return records elif hasattr(key, __int__): # isinstance(key, (int, float)): if abs(key) (self.nCases - 1): raise IndexError else: key = self.nCases + key if key 0 else key record = getitem(key) return record else: raise TypeError Another comment below: # Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32 def __getitem__(self, key): This function reports the record of case numberkey. For example: firstRecord = FileReader(fileName)[0] if not isinstance(key, (int, float)): raise TypeError Floats? Do you actually have have case number (for example) 0.14285714285714285 ? For this case, I think it is reasonable to insist on exactly an int, and nothing else (except possibly a slice object, to support for example obj[2:15]). I also accepted floats as a convenience. I had examples in mind like: record = data[1.0] . Kind of annoying when this raises a TypeError. But in your example makes perfect sense to raise such an exception. Eryksun, Steven: Thanks!!! Albert-Jan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor