Re: need help optimizing a function
G'day all. On Wed, Oct 09, 2002 at 02:29:26PM -0400, David Roundy wrote: > I get a speedup of about a factor of 6 for the test files I was using (of > course, this would depend on file size), and find that now only 2% of my > time is spent in that function. I'm still something like 100 times slower > than GNU diff, so I think (hope, certainly!) there's still room for more > improvement (another day). I'm sure I'll have more questions in a few > days, once I've tracked down what the new bottlenecks are. If you understand logic languages, you might want to look at the diff implementation which I wrote for the Mercury distribution: http://www.cs.mu.oz.au/research/mercury/ It's pretty close to GNU diff in speed. In fact, it was indistinguishable on every test case I could think of. There are, two main differences to what I could gather from your implementation. First thing was that I noticed that a lot of time was being spent doing string comparisons. I inserted a pre-pass which mapped strings to (unboxed) integers with the constraint that the integers are equal if and only if the strings are equal. This also turned out to be critical for implementing flags such as --ignore-space-change. The other was I used a different algorithm than you did: Eugene W. Myers. "An O(ND) difference algorithm and its variations." Algorithmica, 1:251-266, 1986. I found it to be much faster than the O(n log n) algorithm, even on cases where you would expect it to perform poorly (i.e. where D is large), partly because the constant factors are really, really low and because in the pre-pass you can optimise the case where you have a number of consecutive lines none of which appear anywhere else, which is typical for most uses of diff. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: need help optimizing a function
On Tue, Oct 08, 2002 at 07:52:52AM -0700, Hal Daume III wrote: > I already saw one reply mentioning using state threaded arrays. This is > probably your best (in terms of performance) option. > > Keep in mind that Haskell arrays are *not* your standard imperative > arrays. Like imperative arrays, they give you O(1) lookup, but only > O(n) insert. If you want to keep with a functional style, I'd suggest > using a different data structure for the data. A FiniteMap should work > fine and should give you pretty good (at least much > better) performance. And you won't have to give up the FP style (for > whatever that's worth). Thanks everyone for the advice! I think I now understand pretty well why my old code didn't work. I ended up switching to a binary tree structure, since I didn't really need O(1) lookup anyways (since I had to do a binary search in the first place to decide which element if any to modify. This changes the algorithm as a whole (I believe) from being O(n^2) back to O(n log n) as it should be. I get a speedup of about a factor of 6 for the test files I was using (of course, this would depend on file size), and find that now only 2% of my time is spent in that function. I'm still something like 100 times slower than GNU diff, so I think (hope, certainly!) there's still room for more improvement (another day). I'm sure I'll have more questions in a few days, once I've tracked down what the new bottlenecks are. -- David Roundy http://civet.berkeley.edu/droundy/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: need help optimizing a function
Hello, > > p.s., I sense your next question is going to be something like "why > > can't the compiler detect that the array can be updated in place > > instead of copied" and the answer, from what i can tell, is simply > > that it doesn't try. > >And one might argue that it should not try [snip] and the other answer is that the result would not be worth the effort usually. The idea seems nice, but the laziness is where I see the problem; see the following code: do_something::Array Int Int->(Array Int Int, Result) do_something a = (a'',res) where a' = a // [(summon_index, summon_int)] w = a' ! 4 res = do_some_work w a'' = a' // [(summon_other_index, summon_other_int)] Seems like nice candidate for update-in-place? But the problem is, that in general you cannot say when w will be evaluated -- so you simply cannot destroy a' before then. This is also reason why DiffArray (from hslibs) does not have to be very useful, unless you have a good control on when things are evaluated -- it may (and it also happened to me) have quadratic behavior in cases like this. To solve this, '!' would have to produce a new array too; but then you must pass it everywhere and effectively you are doing the work monads would do for you. So in general you must choose -- either your programs will be nice and functional, but you will have to use structures with (some kind of) logarithmic slowdown (FiniteMap, Trie), or some of their parts will be embedded in monads. Zdenek Dvorak _ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: need help optimizing a function
Hal Daume <[EMAIL PROTECTED]> writes: > p.s., I sense your next question is going to be something like "why > can't the compiler detect that the array can be updated in place > instead of copied" and the answer, from what i can tell, is simply > that it doesn't try. And one might argue that it should not try since the result would be incredibly fragile with small, semantics preserving changes to the program confusing the analysis and causing changes in the big-O complexity of your program and, since the analysis would undoubtedly be pretty darn cunning, it'd also be virtually impossible to understand why it did or did not manage to detect the single-threadedness of your code. I think one of the reasons that approaches based on types (monads, MADTs, linear types, etc.) are so popular is that they make the programmers assertion (X is used in a single-threaded manner) sufficiently explicit that the compiler can complain when it disagrees and, since parts of the analysis show through in the typesystem, they make the analysis less of a black box. -- Alastair Reid [EMAIL PROTECTED] Reid Consulting (UK) Limited http://www.reid-consulting-uk.ltd.uk/alastair/ ps The opening paragraphs of Paul Hudak's MADT paper (http://www.haskell.org/papers/madt.ps) have a more coherent explanation of why smarter compilers are not the answer. ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: need help optimizing a function
I already saw one reply mentioning using state threaded arrays. This is probably your best (in terms of performance) option. Keep in mind that Haskell arrays are *not* your standard imperative arrays. Like imperative arrays, they give you O(1) lookup, but only O(n) insert. If you want to keep with a functional style, I'd suggest using a different data structure for the data. A FiniteMap should work fine and should give you pretty good (at least much better) performance. And you won't have to give up the FP style (for whatever that's worth). - Hal p.s., I sense your next question is going to be something like "why can't the compiler detect that the array can be updated in place instead of copied" and the answer, from what i can tell, is simply that it doesn't try. -- Hal Daume III "Computer science is no more about computers| [EMAIL PROTECTED] than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume On Tue, 8 Oct 2002, David Roundy wrote: > Hello. If you followed my previous post (which was about a bug in my lcs > function), this is still the same code (only now bug-free, it seems). > > It turns out that my lcs diff function is MUCH slower than GNU diff on some > test files (much worse than 1000 time slower!).The test case where its > speed was close was IO bound on the output to an xterm (silly me). > > I've profiled my lcs code, and it is spending about 95% of its time in the > "hunt_one_char" function. It also claims to spend 0.0% of its time in > my_bs, which does a binary search on the array, and is the only other > function that gets called by hunt_one_char (not counting itself). > > I've told the compiler (ghc) to inline hunt_one_char and my_bs to no avail. > I guess maybe it can't or won't inline a recursive function, or maybe that > just doesn't gain me anything. > > My only guess is that for some reason it is making an entirely new copy of > the array each time it recurses, but that doesn't make any sense, since the > array is getting threaded through, so it should be able to just modify the > array in place. But then, I'm not entirely clear as to what the compiler > can and cannot do as it optimizes. > > >>data Threshold a = Thresh Int! [a] > >> deriving (Show) > >> > >>hunt_one_char :: String -> [Int] -> Array Int (Threshold String) -> > >> Array Int (Threshold String) > >>hunt_one_char c [] th = th > >>hunt_one_char c (j:js) th = > >>case my_bs (Thresh j [c]) th of > >>Nothing -> hunt_one_char c js th > >>Just k -> > >>case th!(k-1) of > >>Thresh _ rest -> > >>hunt_one_char c js th//[(k,Thresh j (c:rest))] > > For what it's worth, the calling function is: > > >>hunt_internal :: [String] -> [[Int]] -> > >> Array Int (Threshold String) -> > >> Array Int (Threshold String) > >>hunt_internal [] _ th = th > >>hunt_internal _ [] th = th > >>hunt_internal (c:cs) (m:ms) th = > >>hunt_internal cs ms $ hunt_one_char c m th > > Each string in this list of strings is a line from one of the files (and > the [[Int]] is a list of line numbers in the other file that that line > matches). The array 'th' has size (1,n) where n is the number of lines in > the file. > > My simplest test case consists of a 20k line file, each line of which is > unique and a second file which is simply a permutation of the first (moving > the first line to the last), so hunt_one_char should be called exactly once > for each line, and it is always called with its second argument having > exactly one element. In this test case, according to the ghc profiler, > 89.7% of the time is spent in hunt_one_char: > individual inherited > COST CENTRE MODULE entries %time %alloc %time %alloc > hunt_internal Main 200010.2 0.0 91.4 95.3 >hunt_one_char Main 4 89.7 95.0 91.1 95.3 > my_bsMain 20.2 0.0 1.4 0.3 > my_helper_bsMain3072311.2 0.3 1.2 0.3 > > This has me at a loss. It seems like such a simple function... of course, > it is called 20k times, which could certainly add up, but while each call > of my_bs should take O(log 20k) time, each call of hunt_one_char (apart > from its one call to my_bs) should only take O(1), so most of the time > should be spent in my_bs unless something is terribly wrong (which must be > the case here). > > If it would help, I'd be happy to send a listing of the complete code, but > it's 6.5k so I figured it'd be better not to send it, since it seems > unlikely that anyone would want to run it anyways. > -- > David Roundy > http://civet.berkeley.edu/droundy/ > ___ > Haskell-Cafe mailing list > [EMAIL PROTECTED] > http://www.haskell.org/mailman/listinfo/haskell-cafe > _
Re: need help optimizing a function
On Tue, 8 Oct 2002 09:14:26 -0400 David Roundy <[EMAIL PROTECTED]> wrote: > > My only guess is that for some reason it is making an entirely new copy of > the array each time it recurses, but that doesn't make any sense, ... > >>hunt_one_char :: String -> [Int] -> Array Int (Threshold String) -> > >> Array Int (Threshold String) > >>hunt_one_char c [] th = th > >>hunt_one_char c (j:js) th = > >>case my_bs (Thresh j [c]) th of > >>Nothing -> hunt_one_char c js th > >>Just k -> > >>case th!(k-1) of > >>Thresh _ rest -> > >>hunt_one_char c js th//[(k,Thresh j (c:rest))] Yes, I think you're actually copying the array at each recursive invocation when you do the update "th//[..]". The compiler can't guess that the array can be updated in-place. You could use state-thread arrays rather than ordinary Haskell arrays. State-thread are an extension to H98 standard that is supported by ghc and hugs (at least). The basic idea is to use a monad to encapsulate the code that manipulates memory references/updates in place and guaranties that the references are actually used in a single-threaded way. Look up the paper entitled "Lazy Functional State Threads" by John Lauchbury and Simon Peyton Jones (it's available for download). Section 3 contains examples with array updates in-place. Hope this helps, Pedro Vasconcelos. ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe
need help optimizing a function
Hello. If you followed my previous post (which was about a bug in my lcs function), this is still the same code (only now bug-free, it seems). It turns out that my lcs diff function is MUCH slower than GNU diff on some test files (much worse than 1000 time slower!).The test case where its speed was close was IO bound on the output to an xterm (silly me). I've profiled my lcs code, and it is spending about 95% of its time in the "hunt_one_char" function. It also claims to spend 0.0% of its time in my_bs, which does a binary search on the array, and is the only other function that gets called by hunt_one_char (not counting itself). I've told the compiler (ghc) to inline hunt_one_char and my_bs to no avail. I guess maybe it can't or won't inline a recursive function, or maybe that just doesn't gain me anything. My only guess is that for some reason it is making an entirely new copy of the array each time it recurses, but that doesn't make any sense, since the array is getting threaded through, so it should be able to just modify the array in place. But then, I'm not entirely clear as to what the compiler can and cannot do as it optimizes. >>data Threshold a = Thresh Int! [a] >> deriving (Show) >> >>hunt_one_char :: String -> [Int] -> Array Int (Threshold String) -> >> Array Int (Threshold String) >>hunt_one_char c [] th = th >>hunt_one_char c (j:js) th = >>case my_bs (Thresh j [c]) th of >>Nothing -> hunt_one_char c js th >>Just k -> >>case th!(k-1) of >>Thresh _ rest -> >>hunt_one_char c js th//[(k,Thresh j (c:rest))] For what it's worth, the calling function is: >>hunt_internal :: [String] -> [[Int]] -> >> Array Int (Threshold String) -> >> Array Int (Threshold String) >>hunt_internal [] _ th = th >>hunt_internal _ [] th = th >>hunt_internal (c:cs) (m:ms) th = >>hunt_internal cs ms $ hunt_one_char c m th Each string in this list of strings is a line from one of the files (and the [[Int]] is a list of line numbers in the other file that that line matches). The array 'th' has size (1,n) where n is the number of lines in the file. My simplest test case consists of a 20k line file, each line of which is unique and a second file which is simply a permutation of the first (moving the first line to the last), so hunt_one_char should be called exactly once for each line, and it is always called with its second argument having exactly one element. In this test case, according to the ghc profiler, 89.7% of the time is spent in hunt_one_char: individual inherited COST CENTRE MODULE entries %time %alloc %time %alloc hunt_internal Main 200010.2 0.0 91.4 95.3 hunt_one_char Main 4 89.7 95.0 91.1 95.3 my_bsMain 20.2 0.0 1.4 0.3 my_helper_bsMain3072311.2 0.3 1.2 0.3 This has me at a loss. It seems like such a simple function... of course, it is called 20k times, which could certainly add up, but while each call of my_bs should take O(log 20k) time, each call of hunt_one_char (apart from its one call to my_bs) should only take O(1), so most of the time should be spent in my_bs unless something is terribly wrong (which must be the case here). If it would help, I'd be happy to send a listing of the complete code, but it's 6.5k so I figured it'd be better not to send it, since it seems unlikely that anyone would want to run it anyways. -- David Roundy http://civet.berkeley.edu/droundy/ ___ Haskell-Cafe mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/haskell-cafe