Re: Review Request: Don't hang when determining MIME type of corrupted files
On Aug. 21, 2011, 10:07 a.m., David Faure wrote: Thanks Peter and Miroslav. The analysis looks correct, the pre-read part of the patch looks good. I'm just wondering about using Unbuffered. If someone installs a mimetype definition with multiple rules trying to match some bytes after the 2K limit, then all this seeking-and-reading back and forth will be very slow, in unbuffered mode (since neither cache will be used). Miroslav Ľos wrote: I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer and any client passing another QIODevice may open it itself. Maybe a note in the documentation could help them do it if necessary. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. Then again, the documentation to QTcpSocket says they cannot be opened (i.e. ignore I guess) Unbuffered, which is what any other QIODevices may do as well. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. David Faure wrote: (Pasting here, after realizing you probably didn't see my reply on kde-core-devel) I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). The most common case for mimetype determination is from the file manager listing local files, in which case findByUrl will use a QFile for content- determination. Ah, and BTW I have just found a magic rule that needs 4K of data: vmware-player.xml says: match type=string value='config.version = ' offset=0:4096/ All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. I'm not sure what you mean there. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Yes I'm not sure it's worth optimizing for this special case. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer Sure. It's QFile that I have in mind here, when I say that seeking back and forth will be slow, in unbuffered mode. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. It's about seeking and reading, vs just having the data in memory. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. Right. Small bug, but a corner case (only matters if extension unknown). Christoph Feck wrote: Miroslav, can you clarify why Unbuffered is beneficial? I did not understand it from the ongoing discussion, and I have the feeling David did not either :) Miroslav Ľos wrote: Well, apparently I failed to actually publish my previous reply. My apologies. So here it is, with some additions: All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. I'm not sure what you mean there. I mean this excerpt from QFile::open in qt-everywhere-opensource-src-4.7.3/src/corelib/io/qfile.cpp: 995 // QIODevice provides the buffering, so there's no need to request it from the file engine. 996 if (fileEngine()-open(mode | QIODevice::Unbuffered)) { 997 QIODevice::open(mode); 998 if (mode Append) 999 seek(size()); 1000 return true; 1001 } That means the only thing Unbuffered controls in a QFile (i.e. opened with open(OpenMode)) is the use of a buffer provided in its superclass. Every time it needs to
Re: Review Request: Don't hang when determining MIME type of corrupted files
On Aug. 21, 2011, 10:07 a.m., David Faure wrote: Thanks Peter and Miroslav. The analysis looks correct, the pre-read part of the patch looks good. I'm just wondering about using Unbuffered. If someone installs a mimetype definition with multiple rules trying to match some bytes after the 2K limit, then all this seeking-and-reading back and forth will be very slow, in unbuffered mode (since neither cache will be used). Miroslav Ľos wrote: I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer and any client passing another QIODevice may open it itself. Maybe a note in the documentation could help them do it if necessary. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. Then again, the documentation to QTcpSocket says they cannot be opened (i.e. ignore I guess) Unbuffered, which is what any other QIODevices may do as well. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. David Faure wrote: (Pasting here, after realizing you probably didn't see my reply on kde-core-devel) I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). The most common case for mimetype determination is from the file manager listing local files, in which case findByUrl will use a QFile for content- determination. Ah, and BTW I have just found a magic rule that needs 4K of data: vmware-player.xml says: match type=string value='config.version = ' offset=0:4096/ All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. I'm not sure what you mean there. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Yes I'm not sure it's worth optimizing for this special case. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer Sure. It's QFile that I have in mind here, when I say that seeking back and forth will be slow, in unbuffered mode. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. It's about seeking and reading, vs just having the data in memory. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. Right. Small bug, but a corner case (only matters if extension unknown). Christoph Feck wrote: Miroslav, can you clarify why Unbuffered is beneficial? I did not understand it from the ongoing discussion, and I have the feeling David did not either :) Miroslav Ľos wrote: Well, apparently I failed to actually publish my previous reply. My apologies. So here it is, with some additions: All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. I'm not sure what you mean there. I mean this excerpt from QFile::open in qt-everywhere-opensource-src-4.7.3/src/corelib/io/qfile.cpp: 995 // QIODevice provides the buffering, so there's no need to request it from the file engine. 996 if (fileEngine()-open(mode | QIODevice::Unbuffered)) { 997 QIODevice::open(mode); 998 if (mode Append) 999 seek(size()); 1000 return true; 1001 } That means the only thing Unbuffered controls in a QFile (i.e. opened with open(OpenMode)) is the use of a buffer provided in its superclass. Every time it needs to
Re: Review Request: Don't hang when determining MIME type of corrupted files
On Aug. 21, 2011, 10:07 a.m., David Faure wrote: Thanks Peter and Miroslav. The analysis looks correct, the pre-read part of the patch looks good. I'm just wondering about using Unbuffered. If someone installs a mimetype definition with multiple rules trying to match some bytes after the 2K limit, then all this seeking-and-reading back and forth will be very slow, in unbuffered mode (since neither cache will be used). Miroslav Ľos wrote: I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer and any client passing another QIODevice may open it itself. Maybe a note in the documentation could help them do it if necessary. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. Then again, the documentation to QTcpSocket says they cannot be opened (i.e. ignore I guess) Unbuffered, which is what any other QIODevices may do as well. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. David Faure wrote: (Pasting here, after realizing you probably didn't see my reply on kde-core-devel) I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). The most common case for mimetype determination is from the file manager listing local files, in which case findByUrl will use a QFile for content- determination. Ah, and BTW I have just found a magic rule that needs 4K of data: vmware-player.xml says: match type=string value='config.version = ' offset=0:4096/ All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. I'm not sure what you mean there. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Yes I'm not sure it's worth optimizing for this special case. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer Sure. It's QFile that I have in mind here, when I say that seeking back and forth will be slow, in unbuffered mode. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. It's about seeking and reading, vs just having the data in memory. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. Right. Small bug, but a corner case (only matters if extension unknown). Christoph Feck wrote: Miroslav, can you clarify why Unbuffered is beneficial? I did not understand it from the ongoing discussion, and I have the feeling David did not either :) Well, apparently I failed to actually publish my previous reply. My apologies. So here it is, with some additions: All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. I'm not sure what you mean there. I mean this excerpt from QFile::open in qt-everywhere-opensource-src-4.7.3/src/corelib/io/qfile.cpp: 995 // QIODevice provides the buffering, so there's no need to request it from the file engine. 996 if (fileEngine()-open(mode | QIODevice::Unbuffered)) { 997 QIODevice::open(mode); 998 if (mode Append) 999 seek(size()); 1000 return true; 1001 } That means the only thing Unbuffered controls in a QFile (i.e. opened with open(OpenMode)) is the use of a buffer provided in its superclass. Every time it needs to read some data from the device, it first tries to fill the buffer (16K by default). This is only of advantage if the file
Re: Review Request: Don't hang when determining MIME type of corrupted files
On Aug. 21, 2011, 10:07 a.m., David Faure wrote: Thanks Peter and Miroslav. The analysis looks correct, the pre-read part of the patch looks good. I'm just wondering about using Unbuffered. If someone installs a mimetype definition with multiple rules trying to match some bytes after the 2K limit, then all this seeking-and-reading back and forth will be very slow, in unbuffered mode (since neither cache will be used). Miroslav Ľos wrote: I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer and any client passing another QIODevice may open it itself. Maybe a note in the documentation could help them do it if necessary. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. Then again, the documentation to QTcpSocket says they cannot be opened (i.e. ignore I guess) Unbuffered, which is what any other QIODevices may do as well. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. David Faure wrote: (Pasting here, after realizing you probably didn't see my reply on kde-core-devel) I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). The most common case for mimetype determination is from the file manager listing local files, in which case findByUrl will use a QFile for content- determination. Ah, and BTW I have just found a magic rule that needs 4K of data: vmware-player.xml says: match type=string value='config.version = ' offset=0:4096/ All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. I'm not sure what you mean there. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Yes I'm not sure it's worth optimizing for this special case. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer Sure. It's QFile that I have in mind here, when I say that seeking back and forth will be slow, in unbuffered mode. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. It's about seeking and reading, vs just having the data in memory. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. Right. Small bug, but a corner case (only matters if extension unknown). Miroslav, can you clarify why Unbuffered is beneficial? I did not understand it from the ongoing discussion, and I have the feeling David did not either :) - Christoph --- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/102391/#review5872 --- On Aug. 20, 2011, 5:21 p.m., Peter Penz wrote: --- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/102391/ --- (Updated Aug. 20, 2011, 5:21 p.m.) Review request for kdelibs and David Faure. Summary --- If KMimeTypeRepository::findFromContent() tries to determine MIME from a file that cannot be read, such as on a corrupted optical disc, a read attempt is made in KMimeMagicMatch::match() for every available rule, resulting in UI hangs (e.g. file dialogs, dolphin) for tens of minutes (see https://bugs.kde.org/show_bug.cgi?id=280446 for more details). I've submitted this patch here on behalf of Miroslav ?os, who has submitted the
Re: Review Request: Don't hang when determining MIME type of corrupted files
On Monday 22 August 2011 20:12:13 Miroslav Ľos wrote: On Aug. 21, 2011, 10:07 a.m., David Faure wrote: Thanks Peter and Miroslav. The analysis looks correct, the pre-read part of the patch looks good. I'm just wondering about using Unbuffered. If someone installs a mimetype definition with multiple rules trying to match some bytes after the 2K limit, then all this seeking-and-reading back and forth will be very slow, in unbuffered mode (since neither cache will be used). I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). The most common case for mimetype determination is from the file manager listing local files, in which case findByUrl will use a QFile for content- determination. Ah, and BTW I have just found a magic rule that needs 4K of data: vmware-player.xml says: match type=string value='config.version = ' offset=0:4096/ All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. I'm not sure what you mean there. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Yes I'm not sure it's worth optimizing for this special case. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer Sure. It's QFile that I have in mind here, when I say that seeking back and forth will be slow, in unbuffered mode. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. It's about seeking and reading, vs just having the data in memory. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. Right. Small bug, but a corner case (only matters if extension unknown). -- David Faure, fa...@kde.org, http://www.davidfaure.fr Sponsored by Nokia to work on KDE, incl. Konqueror (http://www.konqueror.org).
Re: Review Request: Don't hang when determining MIME type of corrupted files
On Aug. 21, 2011, 10:07 a.m., David Faure wrote: Thanks Peter and Miroslav. The analysis looks correct, the pre-read part of the patch looks good. I'm just wondering about using Unbuffered. If someone installs a mimetype definition with multiple rules trying to match some bytes after the 2K limit, then all this seeking-and-reading back and forth will be very slow, in unbuffered mode (since neither cache will be used). I find Unbuffered causing slowness improbable. From what I've seen in uses of KMimeType::findBy\w*Content in kdelibs, they all (e.g. ftp* and http kioslaves) rather provide their own buffer in a QByteArray rather than a QIODevice; most just provide the path (which is only opened using QFile if it is_local_file). All QFile's buffering is implemented in the QIODevice superclass, it adds Unbuffered in open() to its openMode for its fileEngine() backend. Thus, no buffering is propagated down. The unnecessary 16K read did hit several more EIO's on the broken CD I have, but it is just cosmetic I guess. Nonetheless, I find most uses of these functions ever will be through QFile or QBuffer and any client passing another QIODevice may open it itself. Maybe a note in the documentation could help them do it if necessary. Finally, I wonder if buffering makes a difference as we are only using a few small block reads, not many getChar()'s. Then again, the documentation to QTcpSocket says they cannot be opened (i.e. ignore I guess) Unbuffered, which is what any other QIODevices may do as well. * ftp passes a 1K-capped buffer (kioslave/ftp/ftp.cpp:2471). That may be insufficient for the single rule on my system that needed 1029 bytes. - Miroslav --- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/102391/#review5872 --- On Aug. 20, 2011, 5:21 p.m., Peter Penz wrote: --- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/102391/ --- (Updated Aug. 20, 2011, 5:21 p.m.) Review request for kdelibs and David Faure. Summary --- If KMimeTypeRepository::findFromContent() tries to determine MIME from a file that cannot be read, such as on a corrupted optical disc, a read attempt is made in KMimeMagicMatch::match() for every available rule, resulting in UI hangs (e.g. file dialogs, dolphin) for tens of minutes (see https://bugs.kde.org/show_bug.cgi?id=280446 for more details). I've submitted this patch here on behalf of Miroslav ?os, who has submitted the bug-report and also has written the patch. This addresses bug 280446. http://bugs.kde.org/show_bug.cgi?id=280446 Diffs - kdecore/services/kmimetype.cpp 955bf62 kdecore/services/kmimetyperepository.cpp 6ff3d16 Diff: http://git.reviewboard.kde.org/r/102391/diff Testing --- Thanks, Peter
Re: Review Request: Don't hang when determining MIME type of corrupted files
--- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/102391/#review5872 --- Thanks Peter and Miroslav. The analysis looks correct, the pre-read part of the patch looks good. I'm just wondering about using Unbuffered. If someone installs a mimetype definition with multiple rules trying to match some bytes after the 2K limit, then all this seeking-and-reading back and forth will be very slow, in unbuffered mode (since neither cache will be used). - David On Aug. 20, 2011, 5:21 p.m., Peter Penz wrote: --- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/102391/ --- (Updated Aug. 20, 2011, 5:21 p.m.) Review request for kdelibs and David Faure. Summary --- If KMimeTypeRepository::findFromContent() tries to determine MIME from a file that cannot be read, such as on a corrupted optical disc, a read attempt is made in KMimeMagicMatch::match() for every available rule, resulting in UI hangs (e.g. file dialogs, dolphin) for tens of minutes (see https://bugs.kde.org/show_bug.cgi?id=280446 for more details). I've submitted this patch here on behalf of Miroslav ?os, who has submitted the bug-report and also has written the patch. This addresses bug 280446. http://bugs.kde.org/show_bug.cgi?id=280446 Diffs - kdecore/services/kmimetype.cpp 955bf62 kdecore/services/kmimetyperepository.cpp 6ff3d16 Diff: http://git.reviewboard.kde.org/r/102391/diff Testing --- Thanks, Peter
Review Request: Don't hang when determining MIME type of corrupted files
--- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/102391/ --- Review request for kdelibs and David Faure. Summary --- If KMimeTypeRepository::findFromContent() tries to determine MIME from a file that cannot be read, such as on a corrupted optical disc, a read attempt is made in KMimeMagicMatch::match() for every available rule, resulting in UI hangs (e.g. file dialogs, dolphin) for tens of minutes (see https://bugs.kde.org/show_bug.cgi?id=280446 for more details). I've submitted this patch here on behalf of Miroslav ?os, who has submitted the bug-report and also has written the patch. This addresses bug 280446. http://bugs.kde.org/show_bug.cgi?id=280446 Diffs - kdecore/services/kmimetype.cpp 955bf62 kdecore/services/kmimetyperepository.cpp 6ff3d16 Diff: http://git.reviewboard.kde.org/r/102391/diff Testing --- Thanks, Peter