hidden built-in module
Hello, is there a way to access a module that is hidden because another module (of the same name) is found first? More specifically, i have my own logging.py module, and inside this module, depending on how initialization goes, i may want to do 'from logging import *' from the built-in logging. I hope my description was clear, cheers. I am using python2.4. -- http://mail.python.org/mailman/listinfo/python-list
Re: hidden built-in module
On Mar 5, 1:39 pm, gigs [EMAIL PROTECTED] wrote: koara wrote: Hello, is there a way to access a module that is hidden because another module (of the same name) is found first? More specifically, i have my own logging.py module, and inside this module, depending on how initialization goes, i may want to do 'from logging import *' from the built-in logging. I hope my description was clear, cheers. I am using python2.4. you can add your own logging module in extra directory that have __init__.py and import it like: from extradirectory.logging import * and builtin: from logging import * Thank you for your reply gigs. However, the point of this namespace harakiri is that existing code which uses 'import logging' ... 'logging.info()'... etc. continues working without any change. Renaming my logging.py file is not an option -- if it were, i wouldn't bother naming my module same as a built-in :-) Cheers. -- http://mail.python.org/mailman/listinfo/python-list
Re: hidden built-in module
You can only try and search the sys-path for the logging-module, using sys.prefix and then look for logging.py. Using __import__(path) you get a reference to that module. Diez Thank you Diez, that's the info i'd been looking for :-) So the answer is sys module + __import__ Cheers! -- http://mail.python.org/mailman/listinfo/python-list
mmap disk performance
Hello all, i am using the mmap module (python2.4) to access contents of a file. My question regards the relative performance of mmap.seek() vs mmap.tell(). I have a generator that returns stuff from the file, piece by piece. Since other things may happen to the mmap object in between consecutive next() calls (such as another iterator's next()), i have to store the file position before yield and restore it afterwards by means of tell() and seek(). Is this correct? When restoring, is there a penalty for mmap.seek(pos) where the file position is already at pos (i.e., nothing happened to the file position in between, a common scenario)? If there is, is it worth doing if mmap.tell() != pos: mmap.seek(pos) or such? Cheers! -- http://mail.python.org/mailman/listinfo/python-list
urllib.unquote + unicode
Hello all, i am using urllib.unquote_plus to unquote a string. Sometimes i get a strange string like for example spolu%u017E%E1ci.cz to unquote. Here the problem is that some application decided to quote a non-ascii character as %u directly, instead of using an encoding and quoting byte per byte. Python (2.4.1) simply returns 'spolu%u017E\xe1ci.cz, which is likely not what the application meant. My question is, is this %u quoting a standard (i.e., urllib is in the wrong), is it not (i.e., the application is in the wrong and urllib silently ignores the '%u0' - why?), and most importantly, is there a simple workaround to get it working as expected? Cheers! -- http://mail.python.org/mailman/listinfo/python-list
Re: enumerate overflow
On Oct 3, 7:22 pm, Raymond Hettinger [EMAIL PROTECTED] wrote: In Py2.6, I will mostly likely put in an automatic promotion to long for both enumerate() and count(). It took a while to figure-out how to do this without killing the performance for normal cases (ones used in real programs, not examples contrived to say, omg, see what *could* happen). Raymond Thanks everybody for the reply and suggestions, I'm glad to see the issues's already been discovered/discussed/almostresolved. By the way, I do not consider my programs in any way 'unreal'. -- http://mail.python.org/mailman/listinfo/python-list
unicode categories -- regex
Hello all -- my question regards special meta characters for the re module. I saw in the re module documentation about the possibility to abstract to any alphanumeric unicode character with '\w'. However, there was no info on constructing patterns for other unicode categories, such as purely alphabetical characters, or punctuation symbols etc. I found that this category information actually IS available in python -- in the standard module unicodedata. For example, unicodedata.category(u'.') gives 'Po' for 'Punctuation, other' etc. So how do i include this information in regular pattern search? Any ideas? Thanks. I'm talking about python2.5 here. -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode categories -- regex
At the moment, you have to generate a character class for this yourself, e.g. ... Thank you Martin, this is exactly what i wanted to know. -- http://mail.python.org/mailman/listinfo/python-list
Re: memory efficient set/dictionary
I would recommend you to use a database since it meets your requirements (off-memory, fast, persistent). The bsdddb module (berkeley db) even gives you a dictionary like interface. http://www.python.org/doc/lib/module-bsddb.html Standard SQL databases can work for this, but generally your recommendation of using bsddb works very well for int - int mappings. In particular, I would suggest using a btree, if only because I have had troubles in the past with colliding keys in the bsddb.hash (and recno is just a flat file, and will attempt to create a file i*(record size) to write to record number i . As an alternative, there are many search-engine known methods for mapping int - [int, int, ...], which can be implemented as int - int, where the second int is a pointer to an address on disk. Looking into a few of the open source search implementations may be worthwhile. Thanks guys! I will look into bsddb, hopefully this doesn't keep all keys in memory, i couldn't find answer to that during my (very brief) look into the documentation. And how about the extra memory used for set/dict'ing of integers? Is there a simple answer? -- http://mail.python.org/mailman/listinfo/python-list
memory efficient set/dictionary
What is the best to go about using a large set (or dictionary) that doesn't fit into main memory? What is Python's (2.5 let's say) overhead for storing int in the set, and how much for storing int - int mapping in the dict? Please recommend a module that allows persistent set/dict storage + fast query that best fits my problem, and as lightweight as possible. For queries, the hit ratio is about 10%. Fast updates would be nice, but i can rewrite the algo so that the data is static, so update speed is not critical. Or am i better off not using Python here? Cheers. -- http://mail.python.org/mailman/listinfo/python-list
Re: memory efficient set/dictionary
Hello Steven, On Jun 10, 5:29 pm, Steven D'Aprano [EMAIL PROTECTED] wrote: ... How do you know it won't fit in main memory if you don't know the overhead? A guess? You've tried it and your computer crashed? exactly Please recommend a module that allows persistent set/dict storage + fast query that best fits my problem, Usually I love guessing what people's problems are before making a recommendation, but I'm feeling whimsical so I think I'll ask first. What is the problem you are trying to solve? How many keys do you have? Corpus processing. There are in the order of billions to tens of billions keys (64bit integers). Can you group them in some way, e.g. alphabetically? Do you need to search on random keys, or can you queue them and do them in the order of your choice? Yes, keys in sets and dictionaries can be grouped in any way, order doesn't matter. Not sure what you mean. Yes, i need fast random access (at least i do without having to rethink and rewrite everything, which is what i'd like to avoid with the help of this thread :-) Thanks for the reply! -- http://mail.python.org/mailman/listinfo/python-list
Re: find_longest_match in SequenceMatcher
Hello again John -- your hack/fix seems to work. Thanks a lot, now let's hope timbot will indeed be here shortly with a proper fix =) -- http://mail.python.org/mailman/listinfo/python-list
Re: find_longest_match in SequenceMatcher
John Machin wrote: --test results snip--- Looks to me like the problem has nothing at all to do with the length of the searched strings, but a bug appeared in 2.3. What version(s) were you using? Can you reproduce your results (500 499 giving different answers) with the same version? Hello John, thank you for investigating and responding! Yes, I can reproduce the behaviour with different results within the same version -- which is 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] The catch is to remove the last character, as i described in my original post, as opposed to passing reduced length parameters to find_longest_match, which is what you did. It is morning now, but i still fail to see the mistake i am making -- if it is indeed a bug, where do i report it? Cheers! -- http://mail.python.org/mailman/listinfo/python-list
find_longest_match in SequenceMatcher
Hello, it might be too late or too hot, but i cannot work out this behaviour of find_longest_match() in difflib.SequenceMatcher: string1: releasenotesforwildmagicversion01thiscdromcontainstheinitialreleaseofthesourcecodethataccompaniesthebook3dgameenginedesign:apracticalapproachtorealtimecomputergraphicsthereareanumberofknownissuesaboutthecodeastheseissuesareaddressedtheupdatedcodewillbeavailableatthewebsitehttp://wwwmagicsoftwarecom/[EMAIL PROTECTED] string2: releasenotesforwildmagicversion02updatefromversion01toversion02ifyourcopyofthebookhasversion01andifyoudownloadedversion02fromthewebsitethenapplythefollowingdirectionsforinstallingtheupdateforalinuxinstallationseethesectionattheendofthisdocumentupdatedirectionsassumingthatthetopleveldirectoryiscalledmagicreplacebyyourtoplevelnameyoushouldhavetheversion01contentsinthislocation1deletethecontentsofmagic\include2deletethesubdirectorymagic\source\mgcapplication3deletetheobsoletefiles:amagic\source\mgc find_longest_match(0,500,0,500)=(24,43,10)=version01t What? O_o Clearly there is a longer match, right at the beginning! And then, after removal of the last character from each string (i found the limit of 500 by trial and error -- and it looks suspiciously rounded): find_longest_match(0,499,0,499)=(0,0,32)=releasenotesforwildmagicversion0 Is this the expected behaviour? What's going on? Thank you for any ideas -- http://mail.python.org/mailman/listinfo/python-list