Re: Functions using locks are slowed even when locks are never taken
Sorry to be a bother, but any movement on this? This looks like a real performance bug and I don't yet have the internals knowledge to chase it down myself. On Sun, Nov 3, 2013 at 12:10 PM, Michał Marczyk michal.marc...@gmail.comwrote: Well, that is interesting. The difference between the compiled versions of (defn foo [x] (if ( x 0) (inc x) (locking o (dec x and (defn bar [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) is quite significant. foo gets compiled to a single class, with invocations handled by a single invoke method; bar gets compiled to a class for bar + an extra class for an inner function which handles the (locking o (dec x)) part -- probably very similar to the output for the version with the hand-coded locking-part (although I haven't really looked at that yet). The inner function is a closure, so calling it involves an allocation of a closure object; its ctor receives the closed-over locals as arguments and stores them in two fields (lockee and x). Then they get loaded from the fields in the body of the closure's invoke method etc. I guess I'll have to play around with Java equivalents too... Cheers, Michał On 3 November 2013 20:46, Michael Blume blume.m...@gmail.com wrote: I mean, I'm probably being naive, but this suggests that one could write (defmacro locking' [ forms] `(let [res# (locking ~@forms)] res#)) and use locking' in place of locking for improved performance. Is this wrong? If it's right, does that suggest the macro in clojure.core should be changed? On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com wrote: Huh, interesting. I have: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) (defn foo'' [x] (if ( x 0) (inc x) (locking o (dec x foo' is fast, but foo'' is slow. So something about wrapping the locking clause in a let makes it fast. Still no idea why. On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote: You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS
Re: Functions using locks are slowed even when locks are never taken
I have just created a ticket linking to this discussion, with a copy of one of Michal's earlier messages in the thread as a description of what might be the problem. I would not say that this is the same as movement on this issue. Movement in the form that would lead to a change in the Clojure compiler will require one or more volunteers to carefully diagnose the root cause of the problem, develop one or more patches, respond to comments from screeners and/or Rich Hickey, etc. If people want to raise more attention to this issue, one way to do it is to vote on the ticket in JIRA. If you do not already have an account, you can create one here: http://dev.clojure.org/jira/secure/Signup!default.jspa Andy On Sun, Nov 17, 2013 at 10:14 AM, Michael Blume blume.m...@gmail.comwrote: Sorry to be a bother, but any movement on this? This looks like a real performance bug and I don't yet have the internals knowledge to chase it down myself. On Sun, Nov 3, 2013 at 12:10 PM, Michał Marczyk michal.marc...@gmail.comwrote: Well, that is interesting. The difference between the compiled versions of (defn foo [x] (if ( x 0) (inc x) (locking o (dec x and (defn bar [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) is quite significant. foo gets compiled to a single class, with invocations handled by a single invoke method; bar gets compiled to a class for bar + an extra class for an inner function which handles the (locking o (dec x)) part -- probably very similar to the output for the version with the hand-coded locking-part (although I haven't really looked at that yet). The inner function is a closure, so calling it involves an allocation of a closure object; its ctor receives the closed-over locals as arguments and stores them in two fields (lockee and x). Then they get loaded from the fields in the body of the closure's invoke method etc. I guess I'll have to play around with Java equivalents too... Cheers, Michał On 3 November 2013 20:46, Michael Blume blume.m...@gmail.com wrote: I mean, I'm probably being naive, but this suggests that one could write (defmacro locking' [ forms] `(let [res# (locking ~@forms)] res#)) and use locking' in place of locking for improved performance. Is this wrong? If it's right, does that suggest the macro in clojure.core should be changed? On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com wrote: Huh, interesting. I have: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) (defn foo'' [x] (if ( x 0) (inc x) (locking o (dec x foo' is fast, but foo'' is slow. So something about wrapping the locking clause in a let makes it fast. Still no idea why. On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote: You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered.
Re: Functions using locks are slowed even when locks are never taken
Sorry, I neglected to include a link to the ticket: http://dev.clojure.org/jira/browse/CLJ-1296 On Sun, Nov 17, 2013 at 2:11 PM, Andy Fingerhut andy.finger...@gmail.comwrote: I have just created a ticket linking to this discussion, with a copy of one of Michal's earlier messages in the thread as a description of what might be the problem. I would not say that this is the same as movement on this issue. Movement in the form that would lead to a change in the Clojure compiler will require one or more volunteers to carefully diagnose the root cause of the problem, develop one or more patches, respond to comments from screeners and/or Rich Hickey, etc. If people want to raise more attention to this issue, one way to do it is to vote on the ticket in JIRA. If you do not already have an account, you can create one here: http://dev.clojure.org/jira/secure/Signup!default.jspa Andy On Sun, Nov 17, 2013 at 10:14 AM, Michael Blume blume.m...@gmail.comwrote: Sorry to be a bother, but any movement on this? This looks like a real performance bug and I don't yet have the internals knowledge to chase it down myself. On Sun, Nov 3, 2013 at 12:10 PM, Michał Marczyk michal.marc...@gmail.com wrote: Well, that is interesting. The difference between the compiled versions of (defn foo [x] (if ( x 0) (inc x) (locking o (dec x and (defn bar [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) is quite significant. foo gets compiled to a single class, with invocations handled by a single invoke method; bar gets compiled to a class for bar + an extra class for an inner function which handles the (locking o (dec x)) part -- probably very similar to the output for the version with the hand-coded locking-part (although I haven't really looked at that yet). The inner function is a closure, so calling it involves an allocation of a closure object; its ctor receives the closed-over locals as arguments and stores them in two fields (lockee and x). Then they get loaded from the fields in the body of the closure's invoke method etc. I guess I'll have to play around with Java equivalents too... Cheers, Michał On 3 November 2013 20:46, Michael Blume blume.m...@gmail.com wrote: I mean, I'm probably being naive, but this suggests that one could write (defmacro locking' [ forms] `(let [res# (locking ~@forms)] res#)) and use locking' in place of locking for improved performance. Is this wrong? If it's right, does that suggest the macro in clojure.core should be changed? On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com wrote: Huh, interesting. I have: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) (defn foo'' [x] (if ( x 0) (inc x) (locking o (dec x foo' is fast, but foo'' is slow. So something about wrapping the locking clause in a let makes it fast. Still no idea why. On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote: You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz)
Re: Functions using locks are slowed even when locks are never taken
same results on my machine as well. I tried decompiling the jar (with cider-decompile), and the parts that actually run (the then clause) seems almost identical. If you'r interested, i can post them later tonight. On Sunday, November 3, 2013 2:32:42 AM UTC+2, Michael Blume wrote: Hmm, it seems like if it were JIT related you'd see the same issue with java code, but I get 5ns across the board. https://github.com/MichaelBlume/perf-test/blob/master/src-java/PerfTest.java On Saturday, November 2, 2013 12:01:01 AM UTC-7, Trenton Strong wrote: Verified that I receive the same result patterns as you on my machine. One other possibility outside of what you have already mentioned would be something JIT-related. Perhaps there is an optimization that can be performed if the locking sections of the code are in another function but otherwise no, but I'm not sure how that dovetails with Clojure' fn compilation. On Friday, November 1, 2013 11:53:12 AM UTC-7, Michael Blume wrote: Since 1.6 alpha is out, I reran the tests with that -- basically the same results. On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2270676660 in 60 samples of 37844611 calls. Execution time sample mean : 25.834142 ns Execution time mean : 25.837429 ns Execution time sample std-deviation : 0.718382 ns Execution time std-deviation : 0.729431 ns Execution time lower quantile : 24.837925 ns ( 2.5%) Execution time upper quantile : 27.595781 ns (97.5%) Overhead used : 1.634996 ns Found 4 outliers in 60 samples (6.6667 %) low-severe 2 (3. %) low-mild 2 (3. %) Variance from outliers : 15.7591 % Variance is moderately inflated by outliers done benching -- -- You received this message because you are
Re: Functions using locks are slowed even when locks are never taken
You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume.m...@gmail.com wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild 1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2270676660 in 60 samples of 37844611 calls. Execution time sample mean : 25.834142 ns Execution time mean : 25.837429 ns Execution time sample std-deviation : 0.718382 ns Execution time std-deviation : 0.729431 ns Execution time lower quantile : 24.837925 ns ( 2.5%) Execution time upper quantile : 27.595781 ns (97.5%) Overhead used : 1.634996 ns Found 4 outliers in 60 samples (6.6667 %) low-severe 2 (3. %) low-mild 2 (3. %) Variance from outliers : 15.7591 % Variance is moderately inflated by outliers done benching -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are
Re: Functions using locks are slowed even when locks are never taken
I should perhaps make clear that with direct use of monitor-enter and monitor-exit with a Var it's possible for monitor-enter and monitor-exit to operate on different objects even in the absence of typos, namely if somebody rebinds the Var. To illustrate this with print at the REPL (a regular Clojure REPL, as opposed to nREPL or similar, in case there are any issues with reproducing this in different environments): (def x 1) (future (print x) (Thread/sleep 5000) (print x)) ; prints 1 immediately, but then goes to sleep; ; in the meantime, I say (def x 2) @*2 ; and after a moment 2 is printed NB. this could also happen with alter-var-root or set! to a Var with a thread-local binding. Cheers, Michał On 3 November 2013 18:30, Michał Marczyk michal.marc...@gmail.com wrote: You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume.m...@gmail.com wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild 1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments:
Re: Functions using locks are slowed even when locks are never taken
Good detective work, Michal. So the extra time for the slower version was because a Var was always being accessed in the generated byte code, even if the source code was only explicitly accessing the Var in a branch that was never executed? Thanks, Andy On Sun, Nov 3, 2013 at 9:30 AM, Michał Marczyk michal.marc...@gmail.comwrote: You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume.m...@gmail.com wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild 1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2270676660 in 60 samples of 37844611 calls. Execution time sample mean : 25.834142 ns Execution time mean : 25.837429 ns Execution time sample std-deviation : 0.718382 ns Execution time std-deviation : 0.729431 ns Execution time lower quantile : 24.837925 ns ( 2.5%) Execution time upper
Re: Functions using locks are slowed even when locks are never taken
Huh, interesting. I have: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) (defn foo'' [x] (if ( x 0) (inc x) (locking o (dec x foo' is fast, but foo'' is slow. So something about wrapping the locking clause in a let makes it fast. Still no idea why. On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote: You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume...@gmail.com javascript: wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild 1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2270676660 in 60 samples of 37844611 calls. Execution time sample mean : 25.834142 ns Execution time mean : 25.837429 ns
Re: Functions using locks are slowed even when locks are never taken
I mean, I'm probably being naive, but this suggests that one could write (defmacro locking' [ forms] `(let [res# (locking ~@forms)] res#)) and use locking' in place of locking for improved performance. Is this wrong? If it's right, does that suggest the macro in clojure.core should be changed? On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com wrote: Huh, interesting. I have: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) (defn foo'' [x] (if ( x 0) (inc x) (locking o (dec x foo' is fast, but foo'' is slow. So something about wrapping the locking clause in a let makes it fast. Still no idea why. On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote: You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild 1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS
Re: Functions using locks are slowed even when locks are never taken
Well, that is interesting. The difference between the compiled versions of (defn foo [x] (if ( x 0) (inc x) (locking o (dec x and (defn bar [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) is quite significant. foo gets compiled to a single class, with invocations handled by a single invoke method; bar gets compiled to a class for bar + an extra class for an inner function which handles the (locking o (dec x)) part -- probably very similar to the output for the version with the hand-coded locking-part (although I haven't really looked at that yet). The inner function is a closure, so calling it involves an allocation of a closure object; its ctor receives the closed-over locals as arguments and stores them in two fields (lockee and x). Then they get loaded from the fields in the body of the closure's invoke method etc. I guess I'll have to play around with Java equivalents too... Cheers, Michał On 3 November 2013 20:46, Michael Blume blume.m...@gmail.com wrote: I mean, I'm probably being naive, but this suggests that one could write (defmacro locking' [ forms] `(let [res# (locking ~@forms)] res#)) and use locking' in place of locking for improved performance. Is this wrong? If it's right, does that suggest the macro in clojure.core should be changed? On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com wrote: Huh, interesting. I have: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) (defn foo'' [x] (if ( x 0) (inc x) (locking o (dec x foo' is fast, but foo'' is slow. So something about wrapping the locking clause in a let makes it fast. Still no idea why. On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote: You have a typo in foo -- monitor-exit's argument is 0 (zero) rather than o (the sentinel object). Besides that, in foo both monitor-enter and monitor-exit get their arguments from a Var. Rewriting to use locking, which first puts the object whose monitor will be used in a local (that is, (let [lockee o] ...), where ... performs the locking using the newly introduced local), gives timings identical to those of bar and baz: (defn foo' [x] (if ( x 0) (inc x) (let [res (locking o (dec x))] res))) So this is one reason not to use monitor-enter and monitor-exit directly. Another reason is that locking guarantees that the monitor will be released (by using try / finally, and of course by preventing situations where the matching monitor-enter monitor-exit operate on different objects). In fact, both monitor-enter and monitor-exit carry docstrings which explicitly say that they should not be used in user code and point to locking as the user-facing equivalent to Java's synchronized. Cheers, Michał On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869
Re: Functions using locks are slowed even when locks are never taken
Verified that I receive the same result patterns as you on my machine. One other possibility outside of what you have already mentioned would be something JIT-related. Perhaps there is an optimization that can be performed if the locking sections of the code are in another function but otherwise no, but I'm not sure how that dovetails with Clojure' fn compilation. On Friday, November 1, 2013 11:53:12 AM UTC-7, Michael Blume wrote: Since 1.6 alpha is out, I reran the tests with that -- basically the same results. On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2270676660 in 60 samples of 37844611 calls. Execution time sample mean : 25.834142 ns Execution time mean : 25.837429 ns Execution time sample std-deviation : 0.718382 ns Execution time std-deviation : 0.729431 ns Execution time lower quantile : 24.837925 ns ( 2.5%) Execution time upper quantile : 27.595781 ns (97.5%) Overhead used : 1.634996 ns Found 4 outliers in 60 samples (6.6667 %) low-severe 2 (3. %) low-mild 2 (3. %) Variance from outliers : 15.7591 % Variance is moderately inflated by outliers done benching -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to
Re: Functions using locks are slowed even when locks are never taken
Hmm, it seems like if it were JIT related you'd see the same issue with java code, but I get 5ns across the board. https://github.com/MichaelBlume/perf-test/blob/master/src-java/PerfTest.java On Saturday, November 2, 2013 12:01:01 AM UTC-7, Trenton Strong wrote: Verified that I receive the same result patterns as you on my machine. One other possibility outside of what you have already mentioned would be something JIT-related. Perhaps there is an optimization that can be performed if the locking sections of the code are in another function but otherwise no, but I'm not sure how that dovetails with Clojure' fn compilation. On Friday, November 1, 2013 11:53:12 AM UTC-7, Michael Blume wrote: Since 1.6 alpha is out, I reran the tests with that -- basically the same results. On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2270676660 in 60 samples of 37844611 calls. Execution time sample mean : 25.834142 ns Execution time mean : 25.837429 ns Execution time sample std-deviation : 0.718382 ns Execution time std-deviation : 0.729431 ns Execution time lower quantile : 24.837925 ns ( 2.5%) Execution time upper quantile : 27.595781 ns (97.5%) Overhead used : 1.634996 ns Found 4 outliers in 60 samples (6.6667 %) low-severe 2 (3. %) low-mild 2 (3. %) Variance from outliers : 15.7591 % Variance is moderately inflated by outliers done benching -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For
Functions using locks are slowed even when locks are never taken
https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2270676660 in 60 samples of 37844611 calls. Execution time sample mean : 25.834142 ns Execution time mean : 25.837429 ns Execution time sample std-deviation : 0.718382 ns Execution time std-deviation : 0.729431 ns Execution time lower quantile : 24.837925 ns ( 2.5%) Execution time upper quantile : 27.595781 ns (97.5%) Overhead used : 1.634996 ns Found 4 outliers in 60 samples (6.6667 %) low-severe 2 (3. %) low-mild 2 (3. %) Variance from outliers : 15.7591 % Variance is moderately inflated by outliers done benching -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Functions using locks are slowed even when locks are never taken
Since 1.6 alpha is out, I reran the tests with that -- basically the same results. On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote: https://github.com/MichaelBlume/perf-test (ns perf-test (:use (criterium core)) (:gen-class)) (def o (Object.)) (defn foo [x] (if ( x 0) (inc x) (do (monitor-enter o) (let [res (dec x)] (monitor-exit 0) res (defn bar [x] (if ( x 0) (inc x) (dec x))) (defn locking-part [x l] (monitor-enter l) (let [res (dec x)] (monitor-exit l) res)) (defn baz [x] (if ( x 0) (inc x) (locking-part x o))) (defn -main [] (println benching foo) (bench (foo 5) :verbose) (println benching bar) (bench (bar 5) :verbose) (println benching baz) (bench (baz 5) :verbose) (println done benching)) I'm only ever calling these functions with positive values, so the monitor-enter branch should never be entered. Nevertheless, the performance of foo is much worse than bar or baz. The best guess I've got is that the fact that lock-taking is involved somehow changes how the function is compiled, somehow making the function slower. If the practical upshot is that I shouldn't write functions that only sometimes lock -- that the locking part of a function should always be its own function -- then I can do that, but I'm curious why. $ lein uberjar Compiling perf-test Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar benching foo WARNING: Final GC required 1.5974571326266802 % of runtime x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 391582560 in 60 samples of 6526376 calls. Execution time sample mean : 167.426696 ns Execution time mean : 167.459429 ns Execution time sample std-deviation : 4.079466 ns Execution time std-deviation : 4.097819 ns Execution time lower quantile : 160.742869 ns ( 2.5%) Execution time upper quantile : 175.453376 ns (97.5%) Overhead used : 1.634996 ns Found 2 outliers in 60 samples (3. %) low-severe 2 (3. %) Variance from outliers : 12.5602 % Variance is moderately inflated by outliers benching bar x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2174037300 in 60 samples of 36233955 calls. Execution time sample mean : 26.068923 ns Execution time mean : 26.066422 ns Execution time sample std-deviation : 0.887937 ns Execution time std-deviation : 0.916861 ns Execution time lower quantile : 23.996763 ns ( 2.5%) Execution time upper quantile : 27.911936 ns (97.5%) Overhead used : 1.634996 ns Found 3 outliers in 60 samples (5. %) low-severe 1 (1.6667 %) low-mild 1 (1.6667 %) high-mild1 (1.6667 %) Variance from outliers : 22.1874 % Variance is moderately inflated by outliers benching baz x86_64 Mac OS X 10.8.3 4 cpu(s) Java HotSpot(TM) 64-Bit Server VM 24.0-b28 Runtime arguments: Evaluation count : 2270676660 in 60 samples of 37844611 calls. Execution time sample mean : 25.834142 ns Execution time mean : 25.837429 ns Execution time sample std-deviation : 0.718382 ns Execution time std-deviation : 0.729431 ns Execution time lower quantile : 24.837925 ns ( 2.5%) Execution time upper quantile : 27.595781 ns (97.5%) Overhead used : 1.634996 ns Found 4 outliers in 60 samples (6.6667 %) low-severe 2 (3. %) low-mild 2 (3. %) Variance from outliers : 15.7591 % Variance is moderately inflated by outliers done benching -- -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.