Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)

2009-04-12 Thread Martijn van Oosterhout
On Tue, Apr 7, 2009 at 9:49 AM, Andreas Tille til...@rki.de wrote:
 Well, I did not said that it is actually hard and in UDD you can get this
 easily by

   SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5
          FROM packages WHERE ...

Ok, I see why you're having trouble now; you're splitting up the
description in your DB and thus need to stick it back together. That
does indeed make the process a bit less reliable. The DDTP/DDTSS
treats the description as a single string, the exact string in the
Packages file (the Description field is a single entry in the file) so
we had no issues. By doing extra processing like splitting/stripping
parts of the string it's quite possible you're doing a not invertible
conversion, which would make matching later harder.

It'd be nice if someone went over the version number stuff in
DDTP/DDTSS since by and large it was never used (user display only and
even then it wasn't accurate) and so probably there's plenty of work
there.

It might actually be easier to write a script which simply collected
Packages files from say snapshot.debian.org, calculated all the MD5
sums (you can extract the description field using a regex so it's easy
enough in Perl) and built a database of description MD5s and version
numbers. That would give a reliable mapping, far more reliable than
the DDTP/DDTSS is ever likely to do.

Keep in mind that all dpkg frontends with description only work on the
basis of the complete description string, I'm not sure if anyone is
likely to switch to using versions.

Have a nice day,
-- 
Martijn van Oosterhout klep...@gmail.com http://svana.org/kleptog/


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)

2009-04-12 Thread Andreas Tille

On Sun, 12 Apr 2009, Martijn van Oosterhout wrote:


  SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5
         FROM packages WHERE ...


Ok, I see why you're having trouble now; you're splitting up the
description in your DB and thus need to stick it back together.


That's the format other tables in UDD are using.  But it does not
really make the worst part of the problem - as you see It can
perfectly be joined again.  It is just the md5 sum calsulation which
slows down things and the calculation of the version number is
not reliable in all cases - which I regard as a problem.


That does indeed make the process a bit less reliable.


I don't think that it is the split which causes the problem.  I was
able to reproduce the correct description the way I described above.


The DDTP/DDTSS
treats the description as a single string, the exact string in the
Packages file (the Description field is a single entry in the file) so
we had no issues. By doing extra processing like splitting/stripping
parts of the string it's quite possible you're doing a not invertible
conversion, which would make matching later harder.


In how far?  This is done in UDD with all descriptions and never
caused a problem.


It might actually be easier to write a script which simply collected
Packages files from say snapshot.debian.org, calculated all the MD5
sums (you can extract the description field using a regex so it's easy
enough in Perl) and built a database of description MD5s and version
numbers. That would give a reliable mapping, far more reliable than
the DDTP/DDTSS is ever likely to do.


Can you elaborate a bit more why you regard it as not reliable to
add a version number to DDTP Translation files?

Kind regards

  ANdreas.

--
http://fam-tille.de


Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)

2009-04-07 Thread Andreas Tille

On Sun, 5 Apr 2009, Martijn van Oosterhout wrote:


While I'm not against the idea of version numbers (though it would
have to be a list since a single translation may apply to dozens of
versions)


This might be discussed.


it's not that hard to identify the description you want.
What I often did was simply open up the description file to find the
description I wanted to test, cut and paste it into another console
running md5sum and that would be the md5 I needed to look for.


Well, I did not said that it is actually hard and in UDD you can get this
easily by

   SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5
  FROM packages WHERE ...

but the actual method you are proposing might be not very reliable because
of the importance of spacings (like the exact newlines etc).  So comparing
version numbers is faster in any case and *easily* doable for humans -
even if you have the right md5 sum as you mentioned above - comparing it
is also harder than a short version string.  While the human readability
is not my main concern I care more for the feature to directly compare
Translations and Packages table with the available information rather
than taking the detour over MD5 sums.

Kind regards

Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)

2009-04-05 Thread Martijn van Oosterhout
On Wed, Apr 1, 2009 at 9:47 PM, Andreas Tille til...@rki.de wrote:
 On Wed, 1 Apr 2009, Goswin von Brederlow wrote:

 Then the version number will not be needed when an arch lags
 behind. The translation for the old md5sum can just be kept.

 Well, this thread was missused to discuss several issues. Would you mind
 reading my original posting why version numbers
 in Translation files make sense and would you please base your arguing on
 this posting.  Perhaps I'm just wrong but version
 numbers are really handy in this case and I see an extra benefit
 in making these files somehow human readable (in the sense that
 I doubt you are able to calculate md5sums manually to find out
 the matching description.

While I'm not against the idea of version numbers (though it would
have to be a list since a single translation may apply to dozens of
versions) it's not that hard to identify the description you want.
What I often did was simply open up the description file to find the
description I wanted to test, cut and paste it into another console
running md5sum and that would be the md5 I needed to look for.

Have a nice day,
-- 
Martijn van Oosterhout klep...@gmail.com http://svana.org/kleptog/


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



UDD gatherer for DDTP translations (Was: Extended descriptions size)

2009-04-01 Thread Andreas Tille

On Wed, 1 Apr 2009, Goswin von Brederlow wrote:


Then the version number will not be needed when an arch lags
behind. The translation for the old md5sum can just be kept.


Well, this thread was missused to discuss several issues. 
Would you mind reading my original posting why version numbers
in Translation files make sense and would you please base your 
arguing on this posting.  Perhaps I'm just wrong but version

numbers are really handy in this case and I see an extra benefit
in making these files somehow human readable (in the sense that
I doubt you are able to calculate md5sums manually to find out
the matching description.

Kind regards

  Andreas.

--
http://fam-tille.de


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org