Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)
On Tue, Apr 7, 2009 at 9:49 AM, Andreas Tille til...@rki.de wrote: Well, I did not said that it is actually hard and in UDD you can get this easily by SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5 FROM packages WHERE ... Ok, I see why you're having trouble now; you're splitting up the description in your DB and thus need to stick it back together. That does indeed make the process a bit less reliable. The DDTP/DDTSS treats the description as a single string, the exact string in the Packages file (the Description field is a single entry in the file) so we had no issues. By doing extra processing like splitting/stripping parts of the string it's quite possible you're doing a not invertible conversion, which would make matching later harder. It'd be nice if someone went over the version number stuff in DDTP/DDTSS since by and large it was never used (user display only and even then it wasn't accurate) and so probably there's plenty of work there. It might actually be easier to write a script which simply collected Packages files from say snapshot.debian.org, calculated all the MD5 sums (you can extract the description field using a regex so it's easy enough in Perl) and built a database of description MD5s and version numbers. That would give a reliable mapping, far more reliable than the DDTP/DDTSS is ever likely to do. Keep in mind that all dpkg frontends with description only work on the basis of the complete description string, I'm not sure if anyone is likely to switch to using versions. Have a nice day, -- Martijn van Oosterhout klep...@gmail.com http://svana.org/kleptog/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)
On Sun, 12 Apr 2009, Martijn van Oosterhout wrote: SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5 FROM packages WHERE ... Ok, I see why you're having trouble now; you're splitting up the description in your DB and thus need to stick it back together. That's the format other tables in UDD are using. But it does not really make the worst part of the problem - as you see It can perfectly be joined again. It is just the md5 sum calsulation which slows down things and the calculation of the version number is not reliable in all cases - which I regard as a problem. That does indeed make the process a bit less reliable. I don't think that it is the split which causes the problem. I was able to reproduce the correct description the way I described above. The DDTP/DDTSS treats the description as a single string, the exact string in the Packages file (the Description field is a single entry in the file) so we had no issues. By doing extra processing like splitting/stripping parts of the string it's quite possible you're doing a not invertible conversion, which would make matching later harder. In how far? This is done in UDD with all descriptions and never caused a problem. It might actually be easier to write a script which simply collected Packages files from say snapshot.debian.org, calculated all the MD5 sums (you can extract the description field using a regex so it's easy enough in Perl) and built a database of description MD5s and version numbers. That would give a reliable mapping, far more reliable than the DDTP/DDTSS is ever likely to do. Can you elaborate a bit more why you regard it as not reliable to add a version number to DDTP Translation files? Kind regards ANdreas. -- http://fam-tille.de
Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)
On Sun, 5 Apr 2009, Martijn van Oosterhout wrote: While I'm not against the idea of version numbers (though it would have to be a list since a single translation may apply to dozens of versions) This might be discussed. it's not that hard to identify the description you want. What I often did was simply open up the description file to find the description I wanted to test, cut and paste it into another console running md5sum and that would be the md5 I needed to look for. Well, I did not said that it is actually hard and in UDD you can get this easily by SELECT md5(description || E'\n' || long_description || E'\n' ) AS md5 FROM packages WHERE ... but the actual method you are proposing might be not very reliable because of the importance of spacings (like the exact newlines etc). So comparing version numbers is faster in any case and *easily* doable for humans - even if you have the right md5 sum as you mentioned above - comparing it is also harder than a short version string. While the human readability is not my main concern I care more for the feature to directly compare Translations and Packages table with the available information rather than taking the detour over MD5 sums. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: UDD gatherer for DDTP translations (Was: Extended descriptions size)
On Wed, Apr 1, 2009 at 9:47 PM, Andreas Tille til...@rki.de wrote: On Wed, 1 Apr 2009, Goswin von Brederlow wrote: Then the version number will not be needed when an arch lags behind. The translation for the old md5sum can just be kept. Well, this thread was missused to discuss several issues. Would you mind reading my original posting why version numbers in Translation files make sense and would you please base your arguing on this posting. Perhaps I'm just wrong but version numbers are really handy in this case and I see an extra benefit in making these files somehow human readable (in the sense that I doubt you are able to calculate md5sums manually to find out the matching description. While I'm not against the idea of version numbers (though it would have to be a list since a single translation may apply to dozens of versions) it's not that hard to identify the description you want. What I often did was simply open up the description file to find the description I wanted to test, cut and paste it into another console running md5sum and that would be the md5 I needed to look for. Have a nice day, -- Martijn van Oosterhout klep...@gmail.com http://svana.org/kleptog/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
UDD gatherer for DDTP translations (Was: Extended descriptions size)
On Wed, 1 Apr 2009, Goswin von Brederlow wrote: Then the version number will not be needed when an arch lags behind. The translation for the old md5sum can just be kept. Well, this thread was missused to discuss several issues. Would you mind reading my original posting why version numbers in Translation files make sense and would you please base your arguing on this posting. Perhaps I'm just wrong but version numbers are really handy in this case and I see an extra benefit in making these files somehow human readable (in the sense that I doubt you are able to calculate md5sums manually to find out the matching description. Kind regards Andreas. -- http://fam-tille.de -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org