Florian Weimer <[EMAIL PROTECTED]> writes: > * Goswin von Brederlow: > >>> However, patching rred to apply patches in a single run would be a >>> good start because all further optimizations will need it. >> >> Why should the number of chunks matter? > > If you use the naïve algorithm, it does. But rred implements > something more involved, leading to a O(patch-files * lines + changes) > complexity (or something very similar). > >> What matters is reading, parsing and writing the file O(lines) and >> then the number of changes (lines of changes) O(changes). Combined >> this gives O(lines + changes) if the file is read once at the start >> and then all patches are applied. > > I guess you'll find it very hard to get down to O(lines + changes). 8-) > > Either you need to make multiple passes over the original file (like > the current code does), or you need to combine patch files before > applying them. The latter involves some kind of sorting, and unless > you postulate an upper bound on the number of lines in a file, you'll > end up with super-linear complexity.
Each change in the patches is an insert line, remove line or replace line. Thinking about it I see that insert/remove is a problem if you have an array of line pointers. But with a tree you can still insert/remove/replace each line in O(log lines) or O(changes * log lines) in total, which is better than O(lines * chunks). Or you do keep an array of line pointers and copy that for every chunk you process. Copying 500000 pointers on every chunk doesn't sound too slow. The theoretical complexity would be O(lines * chunks) but the const factor should be way low. >> You can do that by combining the individual patch files or by teaching >> rred to do a single run with multiple patch files. Same result. Both >> solve the problem of O(lines * chunks + changes) complexity. > > The real problem is that APT's architecture doesn't allow rred to look > at all patches at once. At least that's what I've been told. True. The apt methods aren't designed to process multiple files at once. You can't use something designed for gunziping a single file to combine multiple patches into a new Packages file. You have to design something new. >> As for using pointers to lines and shuffeling them that seems to be >> the only sane thing to do. All patch operations are line based so it >> is essential that a line can be found and replaced in O(1). A simple >> array of pointers to lines solves that. > > Inserting in the middle of an array is not a constant-time operation. > That's why the naïve algorithm is slow. Yeah. I noticed that too now. As said above use a tree [O(log n) per operation] or copy the array [O(n) total per pass]. Copying a pointer to a line is better than copying each line since the hidden constant is way lower. MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]