Re: [Okular-devel] md5 hash for annotation file name
Alle giovedì 18 settembre 2008, Albert Astals Cid ha scritto: Reading the whole document by chunks or once at a single shot is basically the same. Not really, reading the document as a whole gives you a QByteArray of the full size of the file, if you read 1MB at a time, you have much less peak memory usage, so you don't get out of physical memory and are potentially much faster. Yes, I was referring more to the read all the document at once thing. -- Pino Toscano signature.asc Description: This is a digitally signed message part. ___ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel
Re: [Okular-devel] md5 hash for annotation file name
Alle giovedì 11 settembre 2008, Markus Grabner ha scritto: Loading the file from a local hard disk takes considerably longer, You are assuming whatever will load a document, will load it at once, while there are file formats that can be read by chunk (eg TIFF, PostScript). so I'm not very much concerned about the hash computation time. However, the readAll() definitely has to be replaced by reading smaller chunks and processing them sequentially, that was just for the proof of concept. Reading the whole document by chunks or once at a single shot is basically the same. so reading up to 1MB as much would be much better imho. If an annotation refers to a typo on the last page of a huge document, and this gets fixed, the same annotation would still be displayed for the corrected file if the correction appears after the portion of the file for which the hash value is computed (at least for uncompressed formats such as PostScript). Or, more than that, two documents could be the very same up to some size. -- Pino Toscano signature.asc Description: This is a digitally signed message part. ___ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel
Re: [Okular-devel] md5 hash for annotation file name
On Thursday 11 September 2008 08:16:05 am Albert Astals Cid wrote: A Dijous 11 Setembre 2008, Markus Grabner va escriure: I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5, and Sha1, so these are natural candidates. It's not an attacker, it's you having two files that collide and gets you annotations from one to another. The probability of a collision (without actually trying to cause one) is 1 in 2^64 for MD4 or MD5, and 1 in 2^80 for SHA1. I don't think that is too much of a problem. The more general issue is annotations for files that change. If you annotate your document, and then edit it, the annotations will be lost. Brad ___ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel
Re: [Okular-devel] md5 hash for annotation file name
Am Donnerstag, 11. September 2008 schrieb Brad Hards: On Thursday 11 September 2008 08:16:05 am Albert Astals Cid wrote: A Dijous 11 Setembre 2008, Markus Grabner va escriure: I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5, and Sha1, so these are natural candidates. It's not an attacker, it's you having two files that collide and gets you annotations from one to another. The probability of a collision (without actually trying to cause one) is 1 in 2^64 for MD4 or MD5, and 1 in 2^80 for SHA1. I don't think that is too much of a problem. The more general issue is annotations for files that change. If you annotate your document, and then edit it, the annotations will be lost. It depends on the application if this is a problem or not. If annotations are made to indicate proposed modifications to the author of a document, there is little reason to display the original annotations after the modifications have been implemented. And unless there are only minor changes, it is unlikely that the annotations will still be related to the updated content. If such a behaviour is desired, it is probably better to enter the annotations directly in the document processor (e.g., a LaTeX \marginpar{}). The current implementation has the same problem (unless the size of the file is the same after a modification), but anyway I understood the annotation concept in okular to be designed for static files. Keeping track of modifications seems to be much harder, in particular if it should work transparently for different file formats. Are there any plans (or requests) to implement such a feature in okular? Kind regards, Markus -- Markus Grabner - Computer Graphics and Vision Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria Phone: +43/316/873-5041, Fax: +43/316/873-5050 WWW: http://www.icg.tugraz.at/Members/grabner ___ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel
[Okular-devel] md5 hash for annotation file name
Hi! It has been discussed (http://bugs.kde.org/show_bug.cgi?id=151614) to use a hash function to determine the name of the annotation file created by okular. The attached patch implements this behaviour (thanks Ivo for pointing me to QCryptographicHash - I looked for such a thing but somehow missed it). It works nicely in several ways: *) Annotations keep associated with the file after renaming it. *) It also works for non-local URLs (http://...) since we don't need to care for mapping the URL to some valid file name. *) Annotations keep associated with the file after downloading it from the web and opening a local copy (possibly under a different name). Kind regards, Markus -- Markus Grabner - Computer Graphics and Vision Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria Phone: +43/316/873-5041, Fax: +43/316/873-5050 WWW: http://www.icg.tugraz.at/Members/grabner Index: okular/core/document.cpp === --- okular/core/document.cpp (Revision 859478) +++ okular/core/document.cpp (Arbeitskopie) @@ -18,6 +18,7 @@ // qt/kde/system includes #include QtCore/QtAlgorithms +#include QtCore/QCryptographicHash #include QtCore/QDir #include QtCore/QFile #include QtCore/QFileInfo @@ -1367,11 +1368,7 @@ // determine the related xml document-info filename d-m_url = url; d-m_docFileName = docFile; -if ( url.isLocalFile() ) -{ -QString fn = url.fileName(); -document_size = fileReadTest.size(); -fn = QString::number( document_size ) + '.' + fn + .xml; + QString fn = QString(QCryptographicHash::hash(fileReadTest.readAll(), QCryptographicHash::Md5).toHex().constData()) + .xml; fileReadTest.close(); QString newokular = okular/docdata/ + fn; QString newokularfile = KStandardDirs::locateLocal( data, newokular ); @@ -1387,7 +1384,6 @@ } } d-m_xmlFileName = newokularfile; -} } else { ___ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel
Re: [Okular-devel] md5 hash for annotation file name
Am Mittwoch, 10. September 2008 schrieb Albert Astals Cid: A Dimecres 10 Setembre 2008, Markus Grabner va escriure: Hi! It has been discussed (http://bugs.kde.org/show_bug.cgi?id=151614) to use a hash function to determine the name of the annotation file created by okular. The attached patch implements this behaviour (thanks Ivo for pointing me to QCryptographicHash - I looked for such a thing but somehow missed it). It works nicely in several ways: *) Annotations keep associated with the file after renaming it. *) It also works for non-local URLs (http://...) since we don't need to care for mapping the URL to some valid file name. *) Annotations keep associated with the file after downloading it from the web and opening a local copy (possibly under a different name). It works not nicely in several ways: *) Md5 sucks, use Sha1 I don't see any serious security threat by using a weak hash function at this point. All an attacker could do would be to create a modified file for which the same annotations would be displayed as for the file the annotations were initially created for. I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5, and Sha1, so these are natural candidates. *) Reading the whole file sucks, i don't want the 100MB of my pdf file to be piped though a hash, it't probably take *some* time Just tried it on my ancient AMD64 2GHz machine and found the following computing times for a 500MB file: MD4: 1.3 seconds MD5: 2 seconds SHA1: 4 seconds Loading the file from a local hard disk takes considerably longer, so I'm not very much concerned about the hash computation time. However, the readAll() definitely has to be replaced by reading smaller chunks and processing them sequentially, that was just for the proof of concept. so reading up to 1MB as much would be much better imho. If an annotation refers to a typo on the last page of a huge document, and this gets fixed, the same annotation would still be displayed for the corrected file if the correction appears after the portion of the file for which the hash value is computed (at least for uncompressed formats such as PostScript). BTW, the current implementation in okular has the same problem since changing a single character in a PostScript file usually doesn't change its size. Kind regards, Markus -- Markus Grabner - Computer Graphics and Vision Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria Phone: +43/316/873-5041, Fax: +43/316/873-5050 WWW: http://www.icg.tugraz.at/Members/grabner ___ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel
Re: [Okular-devel] md5 hash for annotation file name
A Dijous 11 Setembre 2008, Markus Grabner va escriure: Am Mittwoch, 10. September 2008 schrieb Albert Astals Cid: A Dimecres 10 Setembre 2008, Markus Grabner va escriure: Hi! It has been discussed (http://bugs.kde.org/show_bug.cgi?id=151614) to use a hash function to determine the name of the annotation file created by okular. The attached patch implements this behaviour (thanks Ivo for pointing me to QCryptographicHash - I looked for such a thing but somehow missed it). It works nicely in several ways: *) Annotations keep associated with the file after renaming it. *) It also works for non-local URLs (http://...) since we don't need to care for mapping the URL to some valid file name. *) Annotations keep associated with the file after downloading it from the web and opening a local copy (possibly under a different name). It works not nicely in several ways: *) Md5 sucks, use Sha1 I don't see any serious security threat by using a weak hash function at this point. All an attacker could do would be to create a modified file for which the same annotations would be displayed as for the file the annotations were initially created for. I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5, and Sha1, so these are natural candidates. It's not an attacker, it's you having two files that collide and gets you annotations from one to another. *) Reading the whole file sucks, i don't want the 100MB of my pdf file to be piped though a hash, it't probably take *some* time Just tried it on my ancient AMD64 2GHz machine and found the following computing times for a 500MB file: Calling a AMD64 2Ghz ancient makes me think what an EeePC is, prehistory? MD4: 1.3 seconds MD5: 2 seconds SHA1: 4 seconds Loading the file from a local hard disk takes considerably longer How much is that? , so I'm not very much concerned about the hash computation time. However, the readAll() definitely has to be replaced by reading smaller chunks and processing them sequentially, that was just for the proof of concept. So can you see if splitting the read gives us an improvement, 4 seconds on an AMD64 2GHz seems a lot to me. so reading up to 1MB as much would be much better imho. If an annotation refers to a typo on the last page of a huge document, and this gets fixed, the same annotation would still be displayed for the corrected file if the correction appears after the portion of the file for which the hash value is computed (at least for uncompressed formats such as PostScript). BTW, the current implementation in okular has the same problem since changing a single character in a PostScript file usually doesn't change its size. You have a point here Albert Kind regards, Markus ___ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel
Re: [Okular-devel] md5 hash for annotation file name
Am Donnerstag, 11. September 2008 schrieb Albert Astals Cid: A Dijous 11 Setembre 2008, Markus Grabner va escriure: I don't see any serious security threat by using a weak hash function at this point. All an attacker could do would be to create a modified file for which the same annotations would be displayed as for the file the annotations were initially created for. I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5, and Sha1, so these are natural candidates. It's not an attacker, it's you having two files that collide and gets you annotations from one to another. Ok, it's a tradeoff between collision probability and speed, I don't see a clear winner now. Have MD5 collisions been observed under normal conditions (i.e., without injecting some binary code into one of the files)? *) Reading the whole file sucks, i don't want the 100MB of my pdf file to be piped though a hash, it't probably take *some* time Just tried it on my ancient AMD64 2GHz machine and found the following computing times for a 500MB file: Calling a AMD64 2Ghz ancient makes me think what an EeePC is, prehistory? Do you really want to work with a 500MB file on an EeePC :-? MD4: 1.3 seconds MD5: 2 seconds SHA1: 4 seconds Loading the file from a local hard disk takes considerably longer How much is that? 24 seconds for the first time, then 11 seconds when the file is cached. It's a Seagate ST3300622A SATA drive. So the hash computation overhead is moderate on this system. Kind regards, Markus -- Markus Grabner - Computer Graphics and Vision Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria Phone: +43/316/873-5041, Fax: +43/316/873-5050 WWW: http://www.icg.tugraz.at/Members/grabner ___ Okular-devel mailing list Okular-devel@kde.org https://mail.kde.org/mailman/listinfo/okular-devel