[issue5364] documentation in epub format
Changes by Ilpo Nyyssönen i...@iki.fi: -- nosy: +biny ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5364 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15109] sqlite3.Connection.iterdump() dies with encoding exception
Changes by Ilpo Nyyssönen i...@iki.fi: -- nosy: +biny ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15109 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3424] imghdr test order makes it slow
Ilpo Nyyssönen [EMAIL PROTECTED] added the comment: jpeg exif png gif tiff and then the rest ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3424 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3424] imghdr test order makes it slow
New submission from Ilpo Nyyssönen [EMAIL PROTECTED]: The order of tests in imghdr makes it slow in common cases. Even without any statistics it is quite easy to see that jpeg is the most common format. In imghdr only bmp and png are after it. Also, should png really be the last one? Nearly all digital cameras produce jpegs and handling such images is one big use case for this module. Changing the test order should be easy and have big effect in common use cases. -- components: Library (Lib) messages: 70142 nosy: biny severity: normal status: open title: imghdr test order makes it slow type: performance versions: Python 2.5 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3424 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3424] imghdr test order makes it slow
Ilpo Nyyssönen [EMAIL PROTECTED] added the comment: Naturally it requires a big amount of files. Getting big amount of jpegs is easy. Getting big amount of pbms or rgbs is not so easy. I'll attach two profiling runs showing some difference when test_jpeg and test_exif are moved to be the first tests. The beginnings those outputs show the return value counts. Added file: http://bugs.python.org/file10958/current ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3424 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3424] imghdr test order makes it slow
Changes by Ilpo Nyyssönen [EMAIL PROTECTED]: Added file: http://bugs.python.org/file10959/optimized ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3424 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Python less error-prone than Java
Kaz Kylheku [EMAIL PROTECTED] writes: Buggy library code is what prompted that article. Yes, but it is an error type that happens very rarely still. And so it seems that very few programs even notice that bug in that library. Except when you feed those programs inputs which are converted to integers which are then fed as domain values into some operation that doesn't fit into the range type. If the input value range is limited, you want to get an error, if out of range value is given. If you want to handle unlimited values, you really need to take a look that you can do it. Think for example storing such value to a database. 1. null pointer errors 2. wrong type (class cast in Java, some weird missing attribute in python) 3. array/list index out of bounds First and third ones are the same in about every language. ... other than C and C++, where their equivalents just crash or stomp over memory, but never mind; who uses those? ;) It is not different. Your crash can tell you that it was a null pointer. Your crash can tell you that you stomped over memory. You just get the information about the error in different way. Instead of this stupid idea of pointers or references having a null value, you can make a null value which has its own type, and banish null pointers. Yes and I actually think that as bad thing. It is nice to be able to tell the difference between null pointer and wrong type. Of course if the error message tells you that you had null there, it is not a problem, but what if you somehow lose the error message and get only the exception class name? (Yes, you should always keep the message too, but it does happen.) -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list
Re: Python less error-prone than Java
Christoph Zwerschke [EMAIL PROTECTED] writes: What's better about the Python version? First, it will operate on *any* sorted array, no matter which type the values have. But second, there is a hidden error in the Java version that the Python version does not have. While I can see your point, I'd say you are totally in the wrong level here. With Java generics you can sort a list and still keeping the type of the contents defined. This is makes the code less error-prone. But why would you implement binary search as the standard library already has it for both arrays and lists? This is one big thing that makes code less error-prone: using existing well made libraries. You can find binary search from python standard library too (but actually the API in Java is a bit better, see the return values). Well, you can say that the binary search is a good example and in real code you would use the stuff from the libraries. I'd say it is not good example: How often will you write such algorithms? Very rarely. Integer overflows generally are not those errors you run into in programs. The errors happening most often are from my point of view: 1. null pointer errors 2. wrong type (class cast in Java, some weird missing attribute in python) 3. array/list index out of bounds First and third ones are the same in about every language. The second one is one where the typing can make a difference. If in the code level you know the type all the way, there is much less changes of it being wrong. (The sad thing in the Java generics is that it is a compile time only thing and that causes some really weird stuff, but that is too off topic to here.) In python passing sequences for a function and also from a function is very easy. You can very easily pass a sequence as argument list. You can also very easily return a sequence from the function and even split it to variables directly. This is very powerful tool, but it has a problem too: How can you change what you return without breaking the callers? There are many cases where passing an object instead of a sequence makes the code much easier to develop further. What's the point? The point is that neither with Java or Python you want to be doing things in the low level. You really want to be doing stuff with objects and using existing libraries as much as possible. And in that level Java might be less error-prone as it does restrict the ways you can shoot yourself more. -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list
Re: Object oriented storage with validation
Fredrik Lundh [EMAIL PROTECTED] writes: Ilpo Nyyssönen wrote: What is the point in doing validation if it isn't done every time? Why wouldn't I do it every time? It isn't that slow thing to do. DTD validation is useful in two cases: [...] I didn't mention DTD validation. Yes, I know the limitations of DTD validation. DTD validation gives a clear error message with line number in case of it doesn't match. Show me this: - An object oriented storage library - A flat thing is not enough, needs some hierarchy, like in XML - Validation that also converts the data to pythonic types, like numbers to ints or data of my objects to my objects - Includes a way to define version migration steps - Parsing and validation must be fast - Storage format preferably a readable text file - Easy to use in application These things would be nice to have in it too: - Multiple backends, at least text file, XML and SQL database - Some kind of synchronization or replication utilities Pickle doesn't have validation. I am not comfortable for using it as storage format that should be reliable over years when the program evolves. It also doesn't tell me if my program has put something other to the data than I meant to. But DTD validation doesn't tell you that either -- it's only concerned with the structure, not the content. [...] Pickle doesn't have even that. Also I can't read pickle file without doing some program to dump it in readable format. So, I can't use validation to make sure the data in pickle is the one I want and I can't use less to see what is in the file. I really have NO IDEA what is in a pickle file. Or, yes, I clearly would need to build the validation myself on top of it! Not going to happen. If you want the simplest thing, get rid of the DTD, and make your loader ignore things that it doesn't recognize, use default values for fields that are not required (or weren't in the format from the start), and give a nice readable error message if something required is missing. That'll give you a nice, portable, reliable, and extremely future-proof design. So my program will just work in the wrong way if I make a typo to a non-required field when writing the file? No thanks. -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list
Re: Object oriented storage with validation
Ville Vainio [EMAIL PROTECTED] writes: Ilpo == Ilpo Nyyssönen iny writes: Ilpo Pickle doesn't have validation. I am not comfortable for Ilpo using it as storage format that should be reliable over Ilpo years when the program evolves. It also doesn't tell me if That's why you should implement xml import/export mechanism and use the xml file as the canonical data, while the pickle is only a cache for the data. Would make the program too complex, unless it is done by a library. I actually prefer saving only once and doing that in fast, reliable way. Ilpo How can it work automatically in separate module? Replacing Ilpo the re.compile with something sounds possible way of getting Ilpo the regexps, but how and where to store the compiled data? Ilpo Is there a way to put it to the byte code file? Do what you already did - dump the regexp cache to a separate file. That didn't get all of the regexps. It only got the regexps that were loaded in the time I dumped the cache. -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list
Object oriented storage with validation (was: Re: Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl))
[reorganized a bit] Ville Vainio [EMAIL PROTECTED] writes: Why don't you use external validation on the created xml? Validating it every time sounds like way too much like Javaic BD to be fun anymore. Pickle should serve you well, and would probably remove about half of your code. Do the simplest thing that could possibly work and all that. What is the point in doing validation if it isn't done every time? Why wouldn't I do it every time? It isn't that slow thing to do. Pickle doesn't have validation. I am not comfortable for using it as storage format that should be reliable over years when the program evolves. It also doesn't tell me if my program has put something other to the data than I meant to. The program will just throw some weird exception. I want to do the simplest thing, but I also want something that helps me keep the program usable also in the future. I prefer putting some resources to get some validation to it initially than use later more resouces to do something with undetermined lump of data. python has shipped with a fast XML parser since 2.1, or so. Ilpo With what features? validation? I really want a validating Ilpo parser with a DOM interface. (Or something better than DOM, Ilpo must be object oriented.) Check out (coincidentally) Fredrik's elementtree: http://effbot.org/zone/element-index.htm At least the interface looks quite simple and usable. With some validation wrapping over it, it might be ok... Ilpo And my point is that the regular expression compilation can Ilpo be a problem in python. The current regular expression Ilpo engine is just unusable slow in short lived programs with a Ilpo bit bigger amount of regexps. And fixing it should not be Ilpo that hard: an easy improvement would be to add some kind of Ilpo storing mechanism for the compiled regexps. Are there any Ilpo reasons not to do this? It should start life as a third-party module (perhaps written by you, who knows :-). If it is deemed useful and clean enough, it could be integrated w/ python proper. This is clearly something that should not be in the python core, because the regexps themselves aren't there either. How can it work automatically in separate module? Replacing the re.compile with something sounds possible way of getting the regexps, but how and where to store the compiled data? Is there a way to put it to the byte code file? Maybe I need to take a look at it when I find the time... -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expressions - Python vs Perl
Fredrik Lundh [EMAIL PROTECTED] writes: so you picked the wrong file format for the task, and the slowest tool you could find for that file format, and instead of fixing that, you decided that the regular expression engine was to blame for the bad performance. hmm. What would you recommend instead? I have searched alternatives, but somehow I still find XML the best there is. It is a standard format with standard programming API. I don't want to lose my calendar data. XML as a standard format makes it easier to convert later to some other format. As a textual format it is also readable as raw also and this eases debugging. And my point is that the regular expression compilation can be a problem in python. The current regular expression engine is just unusable slow in short lived programs with a bit bigger amount of regexps. And fixing it should not be that hard: an easy improvement would be to add some kind of storing mechanism for the compiled regexps. Are there any reasons not to do this? Nowdays I use libxml2-python as the XML parser and so the problem is not so acute anymore. (That is just harder to get in running for python compiled from source outside the rpm system and it is not so easy to use via DOM interface.) python has shipped with a fast XML parser since 2.1, or so. With what features? validation? I really want a validating parser with a DOM interface. (Or something better than DOM, must be object oriented.) I don't want to make my programs ugly (read: use some more low level interface) and error prone (read: no validation) to make them fast. -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expressions - Python vs Perl
Ville Vainio [EMAIL PROTECTED] writes: Ilpo == Ilpo Nyyssönen iny writes: Ilpo The problem in python here is that it needs to always Ilpo recompile the regexp. I would like to have a way to write a Ilpo regexp as a constant and then python should compile that Ilpo regexp to the byte-code file. Ilpo This is a problem when one has a big amount of regexps. One Ilpo example is the xmlproc parser in PyXML, Read the source for sre.py, esp. _compile. The compiled regexps are cached, so when you invoke e.g. re.match(), it doesn't recompile the regexp. If you would have read what I waid, you would have noticed this: , | I would like to have a way to write a regexp as a constant and then | python should compile that regexp to the byte-code file. ` and this: , | This is not a problem in a program that continues to run long times, | but I want short lived programs like command line apps. ` Of course it caches those when running. The point is that it needs to recompile every time you have restarted the program. With short lived command line programs this really can be a problem. And yes, I have read the source of sre.py and I have made an ugly module that digs the compiled data and pickles it to a file and then in next startup it reads that file and puts the stuff back to the cache. -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expressions - Python vs Perl
James Stroud [EMAIL PROTECTED] writes: Is it relevant that Python can produce compiled expressions? I don't think that there is such a thing with Perl. The problem in python here is that it needs to always recompile the regexp. I would like to have a way to write a regexp as a constant and then python should compile that regexp to the byte-code file. This is a problem when one has a big amount of regexps. One example is the xmlproc parser in PyXML, This is not a problem in a program that continues to run long times, but I want short lived programs like command line apps. Of course we do have ways to go around that limitation, but that is just ugly. -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list