Re: comparing two lists, ndiff performance

2008-01-30 Thread Zbigniew Braniecki
[EMAIL PROTECTED] wrote:
> Zbigniew Braniecki:
>> Is there a way to speed it up? Any easier way? Faster method?
> 
> This problem is a bit messy. Maybe it's better to sidestep the
> problem, and not use a list, and create an object that wraps the list,
> so it always keeps an updated record of what changes are done... but
> you have to notify it if you change the objects it contains.

That would be sweet... But I rarely will have it on the plate.

In most cases I will load the two l10nObjects from the files and then 
I'll have to compare them in the way described above.

So it's something like compare-locales or compare-l10n-directories 
script in the easiest form.

and it'll be launched in the pessimistic case on around 40 locales each 
of them made of ~800 l10nObjects.

I'll probably leave two methods. The faster for automated scripts which 
just have to catch changes and report that the file needs an update, and 
a detailed one for presenting it to the user.

I was just thinking that there maybe exists a smart, fast, powerful 
method that would eliminate use of ndiff to compare two lists.

Thanks for help!

Greetings
Zbigniew Braniecki
-- 
http://mail.python.org/mailman/listinfo/python-list


comparing two lists, ndiff performance

2008-01-29 Thread Zbigniew Braniecki
Hi all.
I'm working on a tool for localizers.

I have two Lists with Entities/Strings/Comments (each L10n file is built 
  of those three elements).

So I have sth like:

l10nObject = []
l10nObject.append(Comment('foo'))
l10nObject.append("string")
l10nObject.append(Entity('name', 'value'))

etc. I can have many strings, many entities and many comments inside.

At some point I want to compare two l10nObjects and see what entities 
were added, what strings were changed, what comments were removed.

So What I did is that I basing on the structure of l10nObject created a 
list of elements with names of the types like:

structure1 = 
['Comment','Entity','Entity','str','str','Entity','Entity','Comment']

and same for structure2.

In result I want to have info about added/removed elements and then I 
can try to match which ones are in reality changed (so compare the value 
of two entities or two comments or two strings) etc.

Long story short. I'm comparing two "structures" using ndiff.
It takes LOONG time.

In case of my script the ndiff takes 80% of the whole script time, and 
in result the new method with ndiff takes over 10 sec for 1000 
iterations, while the old one takes around4 sec for 1000 iterations.

You can take a look at the code at: 
http://svn.braniecki.net/wsvn/Mozpyl10n/lib/mozilla/l10n/diff.py

The def diffToObject at the end is the new method, while def 
compareL10nObjects is the old one.

The new one is of course much better and cleaner (the old one is 
bloated), but I'm wondering if there is a faster way to compare two 
lists and find out what was added, what was removed, what was changed.
I can simply iterate through two lists because I need to keep an order 
(so it's important that the removed line is after the 3 line which was 
not changed etc.)

ndiff plays well here, but it seems to be extremely slow (1000 
iterations of diffToObject takes 10 sec, 7sec of this is in ndiff).

Do you have any idea on how to compare those lists? I have similar 
problem with comparing two directory lists where I also need to keep the 
  order, and I'm using the same ndiff method now.

Is there a way to speed it up? Any easier way? Faster method?

Greetings
Zbigniew Braniecki
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Bug in __init__?

2008-01-18 Thread Zbigniew Braniecki
Christian Heimes wrote:
> Zbigniew Braniecki wrote:
>> Any clue on what's going on here, and/if where I should report it?
> 
> Congratulations! You've stumbled over a well known gotcha. Most newbies
> fall for the trap.
> 
> class A:
>def __init__ (self, val=[]):
>  print val
>  self.lst = val
> 
> val is created only *once* and shared across all instaces of A.

Thanks for help guys!

It's really a nice pitfall, I can hardly imagine anyone expecting this, 
or how easily could I find this info (e.g. what query should I give to 
google to get it without bothering people on this group)

Anyway, thanks :)

Greetings
Zbigniew Braniecki
-- 
http://mail.python.org/mailman/listinfo/python-list


Bug in __init__?

2008-01-18 Thread Zbigniew Braniecki
I found a bug in my code today, and spent an hour trying to locate it 
and then minimize the testcase.

Once I did it, I'm still confused about the behavior and I could not 
find any reference to this behavior in docs.

testcase:

class A():

   def add (self, el):
 self.lst.extend(el)

   def __init__ (self, val=[]):
 print val
 self.lst = val


def test ():
   x = A()
   x.add(["foo1","foo2"])
   b = A()


So, what I would expect here is that I will create two instances of 
class A with empty self.lst property. Right?

In fact (at least with my Python 2.5)

[EMAIL PROTECTED]:~/projects/pyl10n$ ./scripts/test.py
[]
['foo1', 'foo2']

This bug does not happen when I switch to __init__ (self, *args) and 
assign self.lst= args[0].

Any clue on what's going on here, and/if where I should report it?

Greetings
Zbigniew Braniecki
-- 
http://mail.python.org/mailman/listinfo/python-list