[issue5364] documentation in epub format

2012-07-18 Thread Ilpo Nyyssönen

Changes by Ilpo Nyyssönen i...@iki.fi:


--
nosy: +biny

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5364
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15109] sqlite3.Connection.iterdump() dies with encoding exception

2012-07-17 Thread Ilpo Nyyssönen

Changes by Ilpo Nyyssönen i...@iki.fi:


--
nosy: +biny

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15109
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3424] imghdr test order makes it slow

2008-08-02 Thread Ilpo Nyyssönen

Ilpo Nyyssönen [EMAIL PROTECTED] added the comment:

jpeg
exif
png
gif
tiff

and then the rest

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3424
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3424] imghdr test order makes it slow

2008-07-22 Thread Ilpo Nyyssönen

New submission from Ilpo Nyyssönen [EMAIL PROTECTED]:

The order of tests in imghdr makes it slow in common cases. Even without
any statistics it is quite easy to see that jpeg is the most common
format. In imghdr only bmp and png are after it. Also, should png really
be the last one?

Nearly all digital cameras produce jpegs and handling such images is one
big use case for this module.

Changing the test order should be easy and have big effect in common use
cases.

--
components: Library (Lib)
messages: 70142
nosy: biny
severity: normal
status: open
title: imghdr test order makes it slow
type: performance
versions: Python 2.5

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3424
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3424] imghdr test order makes it slow

2008-07-22 Thread Ilpo Nyyssönen

Ilpo Nyyssönen [EMAIL PROTECTED] added the comment:

Naturally it requires a big amount of files. Getting big amount of jpegs
is easy. Getting big amount of pbms or rgbs is not so easy.

I'll attach two profiling runs showing some difference when test_jpeg
and test_exif are moved to be the first tests. The beginnings those
outputs show the return value counts.

Added file: http://bugs.python.org/file10958/current

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3424
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3424] imghdr test order makes it slow

2008-07-22 Thread Ilpo Nyyssönen

Changes by Ilpo Nyyssönen [EMAIL PROTECTED]:


Added file: http://bugs.python.org/file10959/optimized

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3424
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Python less error-prone than Java

2006-06-04 Thread Ilpo Nyyssönen
Kaz Kylheku [EMAIL PROTECTED] writes:

 Buggy library code is what prompted that article.

Yes, but it is an error type that happens very rarely still. And so it
seems that very few programs even notice that bug in that library. 

 Except when you feed those programs inputs which are converted to
 integers which are then fed as domain values into some operation that
 doesn't fit into the range type.

If the input value range is limited, you want to get an error, if out
of range value is given. If you want to handle unlimited values, you
really need to take a look that you can do it. Think for example
storing such value to a database.

 1. null pointer errors
 2. wrong type (class cast in Java, some weird missing attribute in python)
 3. array/list index out of bounds

 First and third ones are the same in about every language.

 ... other than C and C++, where their equivalents just crash or stomp
 over memory, but never mind; who uses those? ;)

It is not different. Your crash can tell you that it was a null
pointer. Your crash can tell you that you stomped over memory. You
just get the information about the error in different way.

 Instead of this stupid idea of pointers or references having a null
 value, you can make a null value which has its own type, and banish
 null pointers.

Yes and I actually think that as bad thing. It is nice to be able to
tell the difference between null pointer and wrong type. Of course if
the error message tells you that you had null there, it is not a
problem, but what if you somehow lose the error message and get only
the exception class name? (Yes, you should always keep the message
too, but it does happen.)

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python less error-prone than Java

2006-06-03 Thread Ilpo Nyyssönen
Christoph Zwerschke [EMAIL PROTECTED] writes:

 What's better about the Python version? First, it will operate on
 *any* sorted array, no matter which type the values have.

 But second, there is a hidden error in the Java version that the
 Python version does not have.

While I can see your point, I'd say you are totally in the wrong level
here.

With Java generics you can sort a list and still keeping the type of
the contents defined. This is makes the code less error-prone. But why
would you implement binary search as the standard library already has
it for both arrays and lists? This is one big thing that makes code
less error-prone: using existing well made libraries. You can find
binary search from python standard library too (but actually the API
in Java is a bit better, see the return values).

Well, you can say that the binary search is a good example and in real
code you would use the stuff from the libraries. I'd say it is not
good example: How often will you write such algorithms? Very rarely.

Integer overflows generally are not those errors you run into in
programs. The errors happening most often are from my point of view:

1. null pointer errors
2. wrong type (class cast in Java, some weird missing attribute in python)
3. array/list index out of bounds

First and third ones are the same in about every language. The second
one is one where the typing can make a difference. If in the code
level you know the type all the way, there is much less changes of it
being wrong. (The sad thing in the Java generics is that it is a
compile time only thing and that causes some really weird stuff, but
that is too off topic to here.)

In python passing sequences for a function and also from a function is
very easy. You can very easily pass a sequence as argument list. You
can also very easily return a sequence from the function and even
split it to variables directly. This is very powerful tool, but it has
a problem too: How can you change what you return without breaking the
callers? There are many cases where passing an object instead of a
sequence makes the code much easier to develop further.

What's the point? The point is that neither with Java or Python you
want to be doing things in the low level. You really want to be doing
stuff with objects and using existing libraries as much as possible. 
And in that level Java might be less error-prone as it does restrict
the ways you can shoot yourself more.

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Object oriented storage with validation

2005-04-25 Thread Ilpo Nyyssönen
Fredrik Lundh [EMAIL PROTECTED] writes:

 Ilpo Nyyssönen wrote:

 What is the point in doing validation if it isn't done every time? Why
 wouldn't I do it every time? It isn't that slow thing to do.

 DTD validation is useful in two cases:
[...]

I didn't mention DTD validation. Yes, I know the limitations of DTD
validation. DTD validation gives a clear error message with line
number in case of it doesn't match.

Show me this:

- An object oriented storage library
- A flat thing is not enough, needs some hierarchy, like in XML
- Validation that also converts the data to pythonic types, like
numbers to ints or data of my objects to my objects
- Includes a way to define version migration steps
- Parsing and validation must be fast
- Storage format preferably a readable text file
- Easy to use in application

These things would be nice to have in it too:

- Multiple backends, at least text file, XML and SQL database
- Some kind of synchronization or replication utilities

 Pickle doesn't have validation. I am not comfortable for using it as
 storage format that should be reliable over years when the program
 evolves. It also doesn't tell me if my program has put something other
 to the data than I meant to.

 But DTD validation doesn't tell you that either -- it's only concerned
 with the structure, not the content.
[...]

Pickle doesn't have even that. Also I can't read pickle file without
doing some program to dump it in readable format. So, I can't use
validation to make sure the data in pickle is the one I want and I
can't use less to see what is in the file. I really have NO IDEA what
is in a pickle file.

Or, yes, I clearly would need to build the validation myself on top of
it! Not going to happen.

 If you want the simplest thing, get rid of the DTD, and make your
 loader ignore things that it doesn't recognize, use default values for
 fields that are not required (or weren't in the format from the start),
 and give a nice readable error message if something required is
 missing.  That'll give you a nice, portable, reliable, and extremely
 future-proof design.

So my program will just work in the wrong way if I make a typo to a
non-required field when writing the file? No thanks.

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Object oriented storage with validation

2005-04-25 Thread Ilpo Nyyssönen
Ville Vainio [EMAIL PROTECTED] writes:

 Ilpo == Ilpo Nyyssönen iny writes:

 Ilpo Pickle doesn't have validation. I am not comfortable for
 Ilpo using it as storage format that should be reliable over
 Ilpo years when the program evolves. It also doesn't tell me if

 That's why you should implement xml import/export mechanism and use
 the xml file as the canonical data, while the pickle is only a cache
 for the data.

Would make the program too complex, unless it is done by a library. I
actually prefer saving only once and doing that in fast, reliable way.


 Ilpo How can it work automatically in separate module? Replacing
 Ilpo the re.compile with something sounds possible way of getting
 Ilpo the regexps, but how and where to store the compiled data?
 Ilpo Is there a way to put it to the byte code file?

 Do what you already did - dump the regexp cache to a separate file. 

That didn't get all of the regexps. It only got the regexps that were
loaded in the time I dumped the cache.

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list


Object oriented storage with validation (was: Re: Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl))

2005-04-24 Thread Ilpo Nyyssönen

[reorganized a bit]

Ville Vainio [EMAIL PROTECTED] writes:

 Why don't you use external validation on the created xml? Validating
 it every time sounds like way too much like Javaic BD to be fun
 anymore. Pickle should serve you well, and would probably remove about
 half of your code. Do the simplest thing that could possibly work
 and all that.

What is the point in doing validation if it isn't done every time? Why
wouldn't I do it every time? It isn't that slow thing to do.

Pickle doesn't have validation. I am not comfortable for using it as
storage format that should be reliable over years when the program
evolves. It also doesn't tell me if my program has put something other
to the data than I meant to. The program will just throw some weird
exception.

I want to do the simplest thing, but I also want something that helps
me keep the program usable also in the future. I prefer putting some
resources to get some validation to it initially than use later more
resouces to do something with undetermined lump of data.

  python has shipped with a fast XML parser since 2.1, or so.

 Ilpo With what features? validation? I really want a validating
 Ilpo parser with a DOM interface. (Or something better than DOM,
 Ilpo must be object oriented.)

 Check out (coincidentally) Fredrik's elementtree:

 http://effbot.org/zone/element-index.htm

At least the interface looks quite simple and usable. With some
validation wrapping over it, it might be ok...

 Ilpo And my point is that the regular expression compilation can
 Ilpo be a problem in python. The current regular expression
 Ilpo engine is just unusable slow in short lived programs with a
 Ilpo bit bigger amount of regexps. And fixing it should not be
 Ilpo that hard: an easy improvement would be to add some kind of
 Ilpo storing mechanism for the compiled regexps. Are there any
 Ilpo reasons not to do this?

 It should start life as a third-party module (perhaps written by you,
 who knows :-). If it is deemed useful and clean enough, it could be
 integrated w/ python proper. This is clearly something that should not
 be in the python core, because the regexps themselves aren't there
 either.

How can it work automatically in separate module? Replacing the
re.compile with something sounds possible way of getting the regexps,
but how and where to store the compiled data? Is there a way to put it
to the byte code file?

Maybe I need to take a look at it when I find the time...

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions - Python vs Perl

2005-04-23 Thread Ilpo Nyyssönen
Fredrik Lundh [EMAIL PROTECTED] writes:

 so you picked the wrong file format for the task, and the slowest
 tool you could find for that file format, and instead of fixing
 that, you decided that the regular expression engine was to blame
 for the bad performance. hmm.

What would you recommend instead?

I have searched alternatives, but somehow I still find XML the best
there is. It is a standard format with standard programming API.

I don't want to lose my calendar data. XML as a standard format makes
it easier to convert later to some other format. As a textual format
it is also readable as raw also and this eases debugging.

And my point is that the regular expression compilation can be a
problem in python. The current regular expression engine is just
unusable slow in short lived programs with a bit bigger amount of
regexps. And fixing it should not be that hard: an easy improvement
would be to add some kind of storing mechanism for the compiled
regexps. Are there any reasons not to do this?

 Nowdays I use libxml2-python as the XML parser and so the problem is
 not so acute anymore. (That is just harder to get in running for
 python compiled from source outside the rpm system and it is not so
 easy to use via DOM interface.)

 python has shipped with a fast XML parser since 2.1, or so.

With what features? validation? I really want a validating parser with
a DOM interface. (Or something better than DOM, must be object
oriented.)

I don't want to make my programs ugly (read: use some more low level
interface) and error prone (read: no validation) to make them fast. 

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions - Python vs Perl

2005-04-22 Thread Ilpo Nyyssönen
Ville Vainio [EMAIL PROTECTED] writes:

 Ilpo == Ilpo Nyyssönen iny writes:

 Ilpo The problem in python here is that it needs to always
 Ilpo recompile the regexp. I would like to have a way to write a
 Ilpo regexp as a constant and then python should compile that
 Ilpo regexp to the byte-code file.

 Ilpo This is a problem when one has a big amount of regexps. One
 Ilpo example is the xmlproc parser in PyXML,

 Read the source for sre.py, esp. _compile. The compiled regexps are
 cached, so when you invoke e.g. re.match(), it doesn't recompile the
 regexp.

If you would have read what I waid, you would have noticed this:

,
| I would like to have a way to write a regexp as a constant and then
| python should compile that regexp to the byte-code file.
`

and this:

,
| This is not a problem in a program that continues to run long times,
| but I want short lived programs like command line apps.
`

Of course it caches those when running. The point is that it needs to
recompile every time you have restarted the program. With short lived
command line programs this really can be a problem.

And yes, I have read the source of sre.py and I have made an ugly
module that digs the compiled data and pickles it to a file and then
in next startup it reads that file and puts the stuff back to the
cache.

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regular Expressions - Python vs Perl

2005-04-21 Thread Ilpo Nyyssönen
James Stroud [EMAIL PROTECTED] writes:

 Is it relevant that Python can produce compiled expressions? I don't think 
 that there is such a thing with Perl.

The problem in python here is that it needs to always recompile the
regexp. I would like to have a way to write a regexp as a constant and
then python should compile that regexp to the byte-code file.

This is a problem when one has a big amount of regexps. One example is
the xmlproc parser in PyXML,

This is not a problem in a program that continues to run long times,
but I want short lived programs like command line apps.

Of course we do have ways to go around that limitation, but that is
just ugly.

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list