Re: [Okular-devel] md5 hash for annotation file name

2008-09-19 Thread Pino Toscano
Alle giovedì 18 settembre 2008, Albert Astals Cid ha scritto:
  Reading the whole document by chunks or once at a single shot is
  basically the same.

 Not really, reading the document as a whole gives you a QByteArray of the
 full size of the file, if you read 1MB at a time, you have much less peak
 memory usage, so you don't get out of physical memory and are potentially
 much faster.

Yes, I was referring more to the read all the document at once thing.

-- 
Pino Toscano


signature.asc
Description: This is a digitally signed message part.
___
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel


Re: [Okular-devel] md5 hash for annotation file name

2008-09-18 Thread Pino Toscano
Alle giovedì 11 settembre 2008, Markus Grabner ha scritto:
 Loading the file from a local hard disk takes considerably longer,

You are assuming whatever will load a document, will load it at once, while 
there are file formats that can be read by chunk (eg TIFF, PostScript).

 so I'm 
 not very much concerned about the hash computation time. However, the
 readAll() definitely has to be replaced by reading smaller chunks and
 processing them sequentially, that was just for the proof of concept.

Reading the whole document by chunks or once at a single shot is basically the 
same.

  so reading up to 1MB as much would be much better imho.

 If an annotation refers to a typo on the last page of a huge document, and
 this gets fixed, the same annotation would still be displayed for the
 corrected file if the correction appears after the portion of the file for
 which the hash value is computed (at least for uncompressed formats such as
 PostScript).

Or, more than that, two documents could be the very same up to some size.

-- 
Pino Toscano


signature.asc
Description: This is a digitally signed message part.
___
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel


Re: [Okular-devel] md5 hash for annotation file name

2008-09-11 Thread Brad Hards
On Thursday 11 September 2008 08:16:05 am Albert Astals Cid wrote:
 A Dijous 11 Setembre 2008, Markus Grabner va escriure:
  I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5,
  and Sha1, so these are natural candidates.

 It's not an attacker, it's you having two files that collide and gets you
 annotations from one to another.
The probability of a collision (without actually trying to cause one) is 1 in 
2^64 for MD4 or MD5, and 1 in 2^80 for SHA1. I don't think that is too much 
of a problem.

The more general issue is annotations for files that change. If you annotate 
your document, and then edit it, the annotations will be lost.

Brad

___
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel


Re: [Okular-devel] md5 hash for annotation file name

2008-09-11 Thread Markus Grabner
Am Donnerstag, 11. September 2008 schrieb Brad Hards:
 On Thursday 11 September 2008 08:16:05 am Albert Astals Cid wrote:
  A Dijous 11 Setembre 2008, Markus Grabner va escriure:
   I like Ivo's proposal to use QCryptographicHash, which supports MD4,
   MD5, and Sha1, so these are natural candidates.
 
  It's not an attacker, it's you having two files that collide and gets you
  annotations from one to another.

 The probability of a collision (without actually trying to cause one) is 1
 in 2^64 for MD4 or MD5, and 1 in 2^80 for SHA1. I don't think that is too
 much of a problem.

 The more general issue is annotations for files that change. If you
 annotate your document, and then edit it, the annotations will be lost.
It depends on the application if this is a problem or not. If annotations are 
made to indicate proposed modifications to the author of a document, there is 
little reason to display the original annotations after the modifications 
have been implemented. And unless there are only minor changes, it is 
unlikely that the annotations will still be related to the updated content. 
If such a behaviour is desired, it is probably better to enter the 
annotations directly in the document processor (e.g., a LaTeX \marginpar{}).

The current implementation has the same problem (unless the size of the file 
is the same after a modification), but anyway I understood the annotation 
concept in okular to be designed for static files. Keeping track of 
modifications seems to be much harder, in particular if it should work 
transparently for different file formats. Are there any plans (or requests) 
to implement such a feature in okular?

Kind regards,
Markus


-- 
Markus Grabner - Computer Graphics and Vision
Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria
Phone: +43/316/873-5041, Fax: +43/316/873-5050
WWW: http://www.icg.tugraz.at/Members/grabner
___
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel


[Okular-devel] md5 hash for annotation file name

2008-09-10 Thread Markus Grabner

Hi!

It has been discussed (http://bugs.kde.org/show_bug.cgi?id=151614) to use 
a hash function to determine the name of the annotation file created by 
okular. The attached patch implements this behaviour (thanks Ivo for pointing 
me to QCryptographicHash - I looked for such a thing but somehow missed it).

It works nicely in several ways:
*) Annotations keep associated with the file after renaming it.
*) It also works for non-local URLs (http://...) since we don't need to care
for mapping the URL to some valid file name.
*) Annotations keep associated with the file after downloading it from the web
and opening a local copy (possibly under a different name).

Kind regards,
Markus


-- 
Markus Grabner - Computer Graphics and Vision
Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria
Phone: +43/316/873-5041, Fax: +43/316/873-5050
WWW: http://www.icg.tugraz.at/Members/grabner
Index: okular/core/document.cpp
===
--- okular/core/document.cpp	(Revision 859478)
+++ okular/core/document.cpp	(Arbeitskopie)
@@ -18,6 +18,7 @@
 
 // qt/kde/system includes
 #include QtCore/QtAlgorithms
+#include QtCore/QCryptographicHash
 #include QtCore/QDir
 #include QtCore/QFile
 #include QtCore/QFileInfo
@@ -1367,11 +1368,7 @@
 // determine the related xml document-info filename
 d-m_url = url;
 d-m_docFileName = docFile;
-if ( url.isLocalFile() )
-{
-QString fn = url.fileName();
-document_size = fileReadTest.size();
-fn = QString::number( document_size ) + '.' + fn + .xml;
+	QString fn = QString(QCryptographicHash::hash(fileReadTest.readAll(), QCryptographicHash::Md5).toHex().constData()) + .xml;
 fileReadTest.close();
 QString newokular = okular/docdata/ + fn;
 QString newokularfile = KStandardDirs::locateLocal( data, newokular );
@@ -1387,7 +1384,6 @@
 }
 }
 d-m_xmlFileName = newokularfile;
-}
 }
 else
 {
___
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel


Re: [Okular-devel] md5 hash for annotation file name

2008-09-10 Thread Markus Grabner
Am Mittwoch, 10. September 2008 schrieb Albert Astals Cid:
 A Dimecres 10 Setembre 2008, Markus Grabner va escriure:
  Hi!
 
  It has been discussed (http://bugs.kde.org/show_bug.cgi?id=151614) to
  use a hash function to determine the name of the annotation file created
  by okular. The attached patch implements this behaviour (thanks Ivo for
  pointing me to QCryptographicHash - I looked for such a thing but somehow
  missed it).
 
  It works nicely in several ways:
  *) Annotations keep associated with the file after renaming it.
  *) It also works for non-local URLs (http://...) since we don't need to
  care for mapping the URL to some valid file name.
  *) Annotations keep associated with the file after downloading it from
  the web and opening a local copy (possibly under a different name).

 It works not nicely in several ways:
  *) Md5 sucks, use Sha1
I don't see any serious security threat by using a weak hash function at this 
point. All an attacker could do would be to create a modified file for which 
the same annotations would be displayed as for the file the annotations were 
initially created for.
I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5, and 
Sha1, so these are natural candidates.

  *) Reading the whole file sucks, i don't want the 100MB of my pdf file to
 be piped though a hash, it't probably take *some* time
Just tried it on my ancient AMD64 2GHz machine and found the following 
computing times for a 500MB file:
MD4: 1.3 seconds
MD5: 2 seconds
SHA1: 4 seconds
Loading the file from a local hard disk takes considerably longer, so I'm not 
very much concerned about the hash computation time. However, the readAll() 
definitely has to be replaced by reading smaller chunks and processing them 
sequentially, that was just for the proof of concept.

 so reading up to 1MB as much would be much better imho.
If an annotation refers to a typo on the last page of a huge document, and 
this gets fixed, the same annotation would still be displayed for the 
corrected file if the correction appears after the portion of the file for 
which the hash value is computed (at least for uncompressed formats such as 
PostScript). BTW, the current implementation in okular has the same problem 
since changing a single character in a PostScript file usually doesn't change 
its size.

Kind regards,
Markus


-- 
Markus Grabner - Computer Graphics and Vision
Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria
Phone: +43/316/873-5041, Fax: +43/316/873-5050
WWW: http://www.icg.tugraz.at/Members/grabner
___
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel


Re: [Okular-devel] md5 hash for annotation file name

2008-09-10 Thread Albert Astals Cid
A Dijous 11 Setembre 2008, Markus Grabner va escriure:
 Am Mittwoch, 10. September 2008 schrieb Albert Astals Cid:
  A Dimecres 10 Setembre 2008, Markus Grabner va escriure:
 Hi!
  
   It has been discussed (http://bugs.kde.org/show_bug.cgi?id=151614)
   to use a hash function to determine the name of the annotation file
   created by okular. The attached patch implements this behaviour (thanks
   Ivo for pointing me to QCryptographicHash - I looked for such a thing
   but somehow missed it).
  
   It works nicely in several ways:
   *) Annotations keep associated with the file after renaming it.
   *) It also works for non-local URLs (http://...) since we don't need to
   care for mapping the URL to some valid file name.
   *) Annotations keep associated with the file after downloading it from
   the web and opening a local copy (possibly under a different name).
 
  It works not nicely in several ways:
   *) Md5 sucks, use Sha1

 I don't see any serious security threat by using a weak hash function at
 this point. All an attacker could do would be to create a modified file for
 which the same annotations would be displayed as for the file the
 annotations were initially created for.
 I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5,
 and Sha1, so these are natural candidates.

It's not an attacker, it's you having two files that collide and gets you 
annotations from one to another.


   *) Reading the whole file sucks, i don't want the 100MB of my pdf file
  to be piped though a hash, it't probably take *some* time

 Just tried it on my ancient AMD64 2GHz machine and found the following
 computing times for a 500MB file:

Calling a AMD64 2Ghz ancient makes me think what an EeePC is, prehistory?

 MD4: 1.3 seconds
 MD5: 2 seconds
 SHA1: 4 seconds
 Loading the file from a local hard disk takes considerably longer
How much is that? 

 , so I'm 
 not very much concerned about the hash computation time. 
 However, the 
 readAll() definitely has to be replaced by reading smaller chunks and
 processing them sequentially, that was just for the proof of concept.

So can you see if splitting the read gives us an improvement, 4 seconds on an 
AMD64 2GHz seems a lot to me.


  so reading up to 1MB as much would be much better imho.

 If an annotation refers to a typo on the last page of a huge document, and
 this gets fixed, the same annotation would still be displayed for the
 corrected file if the correction appears after the portion of the file for
 which the hash value is computed (at least for uncompressed formats such as
 PostScript). BTW, the current implementation in okular has the same problem
 since changing a single character in a PostScript file usually doesn't
 change its size.

You have a point here

Albert


   Kind regards,
   Markus


___
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel


Re: [Okular-devel] md5 hash for annotation file name

2008-09-10 Thread Markus Grabner
Am Donnerstag, 11. September 2008 schrieb Albert Astals Cid:
 A Dijous 11 Setembre 2008, Markus Grabner va escriure:
  I don't see any serious security threat by using a weak hash function at
  this point. All an attacker could do would be to create a modified file
  for which the same annotations would be displayed as for the file the
  annotations were initially created for.
  I like Ivo's proposal to use QCryptographicHash, which supports MD4, MD5,
  and Sha1, so these are natural candidates.

 It's not an attacker, it's you having two files that collide and gets you
 annotations from one to another.

Ok, it's a tradeoff between collision probability and speed, I don't see a 
clear winner now. Have MD5 collisions been observed under normal conditions 
(i.e., without injecting some binary code into one of the files)?

*) Reading the whole file sucks, i don't want the 100MB of my pdf file
   to be piped though a hash, it't probably take *some* time
 
  Just tried it on my ancient AMD64 2GHz machine and found the following
  computing times for a 500MB file:

 Calling a AMD64 2Ghz ancient makes me think what an EeePC is, prehistory?
Do you really want to work with a 500MB file on an EeePC :-?

  MD4: 1.3 seconds
  MD5: 2 seconds
  SHA1: 4 seconds
  Loading the file from a local hard disk takes considerably longer

 How much is that?
24 seconds for the first time, then 11 seconds when the file is cached. It's a 
Seagate ST3300622A SATA drive. So the hash computation overhead is moderate 
on this system.

Kind regards,
Markus


-- 
Markus Grabner - Computer Graphics and Vision
Graz University of Technology, Inffeldgasse 16a/II, 8010 Graz, Austria
Phone: +43/316/873-5041, Fax: +43/316/873-5050
WWW: http://www.icg.tugraz.at/Members/grabner
___
Okular-devel mailing list
Okular-devel@kde.org
https://mail.kde.org/mailman/listinfo/okular-devel