subject:"Functions using locks are slowed even when locks are never taken"

Re: Functions using locks are slowed even when locks are never taken

2013-11-17 Thread Michael Blume

Sorry to be a bother, but any movement on this? This looks like a real
performance bug and I don't yet have the internals knowledge to chase it
down myself.


On Sun, Nov 3, 2013 at 12:10 PM, Michał Marczyk michal.marc...@gmail.comwrote:

 Well, that is interesting.

 The difference between the compiled versions of

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (locking o
   (dec x

 and

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o
 (dec x))]
   res)))

 is quite significant. foo gets compiled to a single class, with
 invocations handled by a single invoke method; bar gets compiled to a
 class for bar + an extra class for an inner function which handles the
 (locking o (dec x)) part -- probably very similar to the output for
 the version with the hand-coded locking-part (although I haven't
 really looked at that yet). The inner function is a closure, so
 calling it involves an allocation of a closure object; its ctor
 receives the closed-over locals as arguments and stores them in two
 fields (lockee and x). Then they get loaded from the fields in the
 body of the closure's invoke method etc.

 I guess I'll have to play around with Java equivalents too...

 Cheers,
 Michał


 On 3 November 2013 20:46, Michael Blume blume.m...@gmail.com wrote:
  I mean, I'm probably being naive, but this suggests that one could write
 
  (defmacro locking' [ forms]
`(let [res# (locking ~@forms)] res#))
 
  and use locking' in place of locking for improved performance. Is this
  wrong? If it's right, does that suggest the macro in clojure.core should
 be
  changed?
 
 
  On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com
 wrote:
 
  Huh, interesting.
 
  I have:
 
  (defn foo' [x]
(if ( x 0)
  (inc x)
  (let [res (locking o (dec x))] res)))
 
  (defn foo'' [x]
(if ( x 0)
  (inc x)
  (locking o
(dec x
 
  foo' is fast, but foo'' is slow. So something about wrapping the locking
  clause in a let makes it fast. Still no idea why.
 
  On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote:
 
  You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
  than o (the sentinel object).
 
  Besides that, in foo both monitor-enter and monitor-exit get their
  arguments from a Var. Rewriting to use locking, which first puts the
  object whose monitor will be used in a local (that is, (let [lockee o]
  ...), where ... performs the locking using the newly introduced
  local), gives timings identical to those of bar and baz:
 
  (defn foo' [x]
(if ( x 0)
  (inc x)
  (let [res (locking o (dec x))] res)))
 
  So this is one reason not to use monitor-enter and monitor-exit
  directly. Another reason is that locking guarantees that the monitor
  will be released (by using try / finally, and of course by preventing
  situations where the matching monitor-enter  monitor-exit operate on
  different objects).
 
  In fact, both monitor-enter and monitor-exit carry docstrings which
  explicitly say that they should not be used in user code and point to
  locking as the user-facing equivalent to Java's synchronized.
 
  Cheers,
  Michał
 
 
  On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote:
   https://github.com/MichaelBlume/perf-test
  
   (ns perf-test
 (:use (criterium core))
 (:gen-class))
  
   (def o (Object.))
  
   (defn foo [x]
 (if ( x 0)
   (inc x)
   (do
 (monitor-enter o)
 (let [res (dec x)]
   (monitor-exit 0)
   res
  
   (defn bar [x]
 (if ( x 0)
   (inc x)
   (dec x)))
  
   (defn locking-part [x l]
 (monitor-enter l)
 (let [res (dec x)]
   (monitor-exit l)
   res))
  
   (defn baz [x]
 (if ( x 0)
   (inc x)
   (locking-part x o)))
  
   (defn -main []
 (println benching foo)
 (bench (foo 5) :verbose)
 (println benching bar)
 (bench (bar 5) :verbose)
 (println benching baz)
 (bench (baz 5) :verbose)
 (println done benching))
  
  
  
   I'm only ever calling these functions with positive values, so the
   monitor-enter branch should never be entered. Nevertheless, the
   performance
   of foo is much worse than bar or baz.
  
   The best guess I've got is that the fact that lock-taking is involved
   somehow changes how the function is compiled, somehow making the
   function
   slower. If the practical upshot is that I shouldn't write functions
   that
   only sometimes lock -- that the locking part of a function should
   always be
   its own function -- then I can do that, but I'm curious why.
  
   $ lein uberjar
   Compiling perf-test
   Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
   Created
   /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
   $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar
   benching foo
   WARNING: Final GC required 1.5974571326266802 % of runtime
   x86_64 Mac OS

Re: Functions using locks are slowed even when locks are never taken

2013-11-17 Thread Andy Fingerhut

I have just created a ticket linking to this discussion, with a copy of one
of Michal's earlier messages in the thread as a description of what might
be the problem.

I would not say that this is the same as movement on this issue.  Movement
in the form that would lead to a change in the Clojure compiler will
require one or more volunteers to carefully diagnose the root cause of the
problem, develop one or more patches, respond to comments from screeners
and/or Rich Hickey, etc.  If people want to raise more attention to this
issue, one way to do it is to vote on the ticket in JIRA.  If you do not
already have an account, you can create one here:

http://dev.clojure.org/jira/secure/Signup!default.jspa

Andy


On Sun, Nov 17, 2013 at 10:14 AM, Michael Blume blume.m...@gmail.comwrote:

 Sorry to be a bother, but any movement on this? This looks like a real
 performance bug and I don't yet have the internals knowledge to chase it
 down myself.


 On Sun, Nov 3, 2013 at 12:10 PM, Michał Marczyk 
 michal.marc...@gmail.comwrote:

 Well, that is interesting.

 The difference between the compiled versions of

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (locking o
   (dec x

 and

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o
 (dec x))]
   res)))

 is quite significant. foo gets compiled to a single class, with
 invocations handled by a single invoke method; bar gets compiled to a
 class for bar + an extra class for an inner function which handles the
 (locking o (dec x)) part -- probably very similar to the output for
 the version with the hand-coded locking-part (although I haven't
 really looked at that yet). The inner function is a closure, so
 calling it involves an allocation of a closure object; its ctor
 receives the closed-over locals as arguments and stores them in two
 fields (lockee and x). Then they get loaded from the fields in the
 body of the closure's invoke method etc.

 I guess I'll have to play around with Java equivalents too...

 Cheers,
 Michał


 On 3 November 2013 20:46, Michael Blume blume.m...@gmail.com wrote:
  I mean, I'm probably being naive, but this suggests that one could write
 
  (defmacro locking' [ forms]
`(let [res# (locking ~@forms)] res#))
 
  and use locking' in place of locking for improved performance. Is this
  wrong? If it's right, does that suggest the macro in clojure.core
 should be
  changed?
 
 
  On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com
 wrote:
 
  Huh, interesting.
 
  I have:
 
  (defn foo' [x]
(if ( x 0)
  (inc x)
  (let [res (locking o (dec x))] res)))
 
  (defn foo'' [x]
(if ( x 0)
  (inc x)
  (locking o
(dec x
 
  foo' is fast, but foo'' is slow. So something about wrapping the
 locking
  clause in a let makes it fast. Still no idea why.
 
  On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote:
 
  You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
  than o (the sentinel object).
 
  Besides that, in foo both monitor-enter and monitor-exit get their
  arguments from a Var. Rewriting to use locking, which first puts the
  object whose monitor will be used in a local (that is, (let [lockee o]
  ...), where ... performs the locking using the newly introduced
  local), gives timings identical to those of bar and baz:
 
  (defn foo' [x]
(if ( x 0)
  (inc x)
  (let [res (locking o (dec x))] res)))
 
  So this is one reason not to use monitor-enter and monitor-exit
  directly. Another reason is that locking guarantees that the monitor
  will be released (by using try / finally, and of course by preventing
  situations where the matching monitor-enter  monitor-exit operate on
  different objects).
 
  In fact, both monitor-enter and monitor-exit carry docstrings which
  explicitly say that they should not be used in user code and point to
  locking as the user-facing equivalent to Java's synchronized.
 
  Cheers,
  Michał
 
 
  On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote:
   https://github.com/MichaelBlume/perf-test
  
   (ns perf-test
 (:use (criterium core))
 (:gen-class))
  
   (def o (Object.))
  
   (defn foo [x]
 (if ( x 0)
   (inc x)
   (do
 (monitor-enter o)
 (let [res (dec x)]
   (monitor-exit 0)
   res
  
   (defn bar [x]
 (if ( x 0)
   (inc x)
   (dec x)))
  
   (defn locking-part [x l]
 (monitor-enter l)
 (let [res (dec x)]
   (monitor-exit l)
   res))
  
   (defn baz [x]
 (if ( x 0)
   (inc x)
   (locking-part x o)))
  
   (defn -main []
 (println benching foo)
 (bench (foo 5) :verbose)
 (println benching bar)
 (bench (bar 5) :verbose)
 (println benching baz)
 (bench (baz 5) :verbose)
 (println done benching))
  
  
  
   I'm only ever calling these functions with positive values, so the
   monitor-enter branch should never be entered.

Re: Functions using locks are slowed even when locks are never taken

2013-11-17 Thread Andy Fingerhut

Sorry, I neglected to include a link to the ticket:

http://dev.clojure.org/jira/browse/CLJ-1296


On Sun, Nov 17, 2013 at 2:11 PM, Andy Fingerhut andy.finger...@gmail.comwrote:

 I have just created a ticket linking to this discussion, with a copy of
 one of Michal's earlier messages in the thread as a description of what
 might be the problem.

 I would not say that this is the same as movement on this issue.  Movement
 in the form that would lead to a change in the Clojure compiler will
 require one or more volunteers to carefully diagnose the root cause of the
 problem, develop one or more patches, respond to comments from screeners
 and/or Rich Hickey, etc.  If people want to raise more attention to this
 issue, one way to do it is to vote on the ticket in JIRA.  If you do not
 already have an account, you can create one here:

 http://dev.clojure.org/jira/secure/Signup!default.jspa

 Andy


 On Sun, Nov 17, 2013 at 10:14 AM, Michael Blume blume.m...@gmail.comwrote:

 Sorry to be a bother, but any movement on this? This looks like a real
 performance bug and I don't yet have the internals knowledge to chase it
 down myself.


 On Sun, Nov 3, 2013 at 12:10 PM, Michał Marczyk michal.marc...@gmail.com
  wrote:

 Well, that is interesting.

 The difference between the compiled versions of

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (locking o
   (dec x

 and

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o
 (dec x))]
   res)))

 is quite significant. foo gets compiled to a single class, with
 invocations handled by a single invoke method; bar gets compiled to a
 class for bar + an extra class for an inner function which handles the
 (locking o (dec x)) part -- probably very similar to the output for
 the version with the hand-coded locking-part (although I haven't
 really looked at that yet). The inner function is a closure, so
 calling it involves an allocation of a closure object; its ctor
 receives the closed-over locals as arguments and stores them in two
 fields (lockee and x). Then they get loaded from the fields in the
 body of the closure's invoke method etc.

 I guess I'll have to play around with Java equivalents too...

 Cheers,
 Michał


 On 3 November 2013 20:46, Michael Blume blume.m...@gmail.com wrote:
  I mean, I'm probably being naive, but this suggests that one could
 write
 
  (defmacro locking' [ forms]
`(let [res# (locking ~@forms)] res#))
 
  and use locking' in place of locking for improved performance. Is this
  wrong? If it's right, does that suggest the macro in clojure.core
 should be
  changed?
 
 
  On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com
 wrote:
 
  Huh, interesting.
 
  I have:
 
  (defn foo' [x]
(if ( x 0)
  (inc x)
  (let [res (locking o (dec x))] res)))
 
  (defn foo'' [x]
(if ( x 0)
  (inc x)
  (locking o
(dec x
 
  foo' is fast, but foo'' is slow. So something about wrapping the
 locking
  clause in a let makes it fast. Still no idea why.
 
  On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote:
 
  You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
  than o (the sentinel object).
 
  Besides that, in foo both monitor-enter and monitor-exit get their
  arguments from a Var. Rewriting to use locking, which first puts the
  object whose monitor will be used in a local (that is, (let [lockee
 o]
  ...), where ... performs the locking using the newly introduced
  local), gives timings identical to those of bar and baz:
 
  (defn foo' [x]
(if ( x 0)
  (inc x)
  (let [res (locking o (dec x))] res)))
 
  So this is one reason not to use monitor-enter and monitor-exit
  directly. Another reason is that locking guarantees that the monitor
  will be released (by using try / finally, and of course by preventing
  situations where the matching monitor-enter  monitor-exit operate on
  different objects).
 
  In fact, both monitor-enter and monitor-exit carry docstrings which
  explicitly say that they should not be used in user code and point to
  locking as the user-facing equivalent to Java's synchronized.
 
  Cheers,
  Michał
 
 
  On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote:
   https://github.com/MichaelBlume/perf-test
  
   (ns perf-test
 (:use (criterium core))
 (:gen-class))
  
   (def o (Object.))
  
   (defn foo [x]
 (if ( x 0)
   (inc x)
   (do
 (monitor-enter o)
 (let [res (dec x)]
   (monitor-exit 0)
   res
  
   (defn bar [x]
 (if ( x 0)
   (inc x)
   (dec x)))
  
   (defn locking-part [x l]
 (monitor-enter l)
 (let [res (dec x)]
   (monitor-exit l)
   res))
  
   (defn baz [x]
 (if ( x 0)
   (inc x)
   (locking-part x o)))
  
   (defn -main []
 (println benching foo)
 (bench (foo 5) :verbose)
 (println benching bar)
 (bench (bar 5) :verbose)
 (println benching baz)

Re: Functions using locks are slowed even when locks are never taken

2013-11-03 Thread shlomivaknin

same results on my machine as well.

I tried decompiling the jar (with cider-decompile), and the parts that 
actually run (the then clause) seems almost identical. If you'r 
interested, i can post them later tonight.

On Sunday, November 3, 2013 2:32:42 AM UTC+2, Michael Blume wrote:

 Hmm, it seems like if it were JIT related you'd see the same issue with 
 java code, but I get 5ns across the board.


 https://github.com/MichaelBlume/perf-test/blob/master/src-java/PerfTest.java

 On Saturday, November 2, 2013 12:01:01 AM UTC-7, Trenton Strong wrote:

 Verified that I receive the same result patterns as you on my machine.

 One other possibility outside of what you have already mentioned would be 
 something JIT-related.  Perhaps there is an optimization that can be 
 performed if the locking sections of the code are in another function but 
 otherwise no, but I'm not sure how that dovetails with Clojure' fn 
 compilation.

 On Friday, November 1, 2013 11:53:12 AM UTC-7, Michael Blume wrote:

 Since 1.6 alpha is out, I reran the tests with that -- basically the 
 same results.

 On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote:

 https://github.com/MichaelBlume/perf-test

 (ns perf-test
   (:use (criterium core))
   (:gen-class))

 (def o (Object.))

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (do
   (monitor-enter o)
   (let [res (dec x)]
 (monitor-exit 0)
 res

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (dec x)))

 (defn locking-part [x l]
   (monitor-enter l)
   (let [res (dec x)]
 (monitor-exit l)
 res))

 (defn baz [x]
   (if ( x 0)
 (inc x)
 (locking-part x o)))

 (defn -main []
   (println benching foo)
   (bench (foo 5) :verbose) 
   (println benching bar)
   (bench (bar 5) :verbose)
   (println benching baz)
   (bench (baz 5) :verbose)
   (println done benching))



 I'm only ever calling these functions with positive values, so the 
 monitor-enter branch should never be entered. Nevertheless, the 
 performance of foo is much worse than bar or baz.

 The best guess I've got is that the fact that lock-taking is involved 
 somehow changes how the function is compiled, somehow making the function 
 slower. If the practical upshot is that I shouldn't write functions that 
 only sometimes lock -- that the locking part of a function should always 
 be its own function -- then I can do that, but I'm curious why.

 $ lein uberjar
 Compiling perf-test
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
 Created 
 /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
 $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
 benching foo
 WARNING: Final GC required 1.5974571326266802 % of runtime
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 391582560 in 60 samples of 6526376 calls.
   Execution time sample mean : 167.426696 ns
  Execution time mean : 167.459429 ns
 Execution time sample std-deviation : 4.079466 ns
 Execution time std-deviation : 4.097819 ns
Execution time lower quantile : 160.742869 ns ( 2.5%)
Execution time upper quantile : 175.453376 ns (97.5%)
Overhead used : 1.634996 ns

 Found 2 outliers in 60 samples (3. %)
low-severe   2 (3. %)
  Variance from outliers : 12.5602 % Variance is moderately inflated by 
 outliers
 benching bar
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2174037300 in 60 samples of 36233955 calls.
   Execution time sample mean : 26.068923 ns
  Execution time mean : 26.066422 ns
 Execution time sample std-deviation : 0.887937 ns
 Execution time std-deviation : 0.916861 ns
Execution time lower quantile : 23.996763 ns ( 2.5%)
Execution time upper quantile : 27.911936 ns (97.5%)
Overhead used : 1.634996 ns

 Found 3 outliers in 60 samples (5. %)
low-severe   1 (1.6667 %)
low-mild 1 (1.6667 %)
high-mild1 (1.6667 %)
  Variance from outliers : 22.1874 % Variance is moderately inflated by 
 outliers
 benching baz
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2270676660 in 60 samples of 37844611 calls.
   Execution time sample mean : 25.834142 ns
  Execution time mean : 25.837429 ns
 Execution time sample std-deviation : 0.718382 ns
 Execution time std-deviation : 0.729431 ns
Execution time lower quantile : 24.837925 ns ( 2.5%)
Execution time upper quantile : 27.595781 ns (97.5%)
Overhead used : 1.634996 ns

 Found 4 outliers in 60 samples (6.6667 %)
low-severe   2 (3. %)
low-mild 2 (3. %)
  Variance from outliers : 15.7591 % Variance is moderately inflated by 
 outliers
 done benching



-- 
-- 
You received this message because you are

Re: Functions using locks are slowed even when locks are never taken

2013-11-03 Thread Michał Marczyk

You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
than o (the sentinel object).

Besides that, in foo both monitor-enter and monitor-exit get their
arguments from a Var. Rewriting to use locking, which first puts the
object whose monitor will be used in a local (that is, (let [lockee o]
...), where ... performs the locking using the newly introduced
local), gives timings identical to those of bar and baz:

(defn foo' [x]
  (if ( x 0)
(inc x)
(let [res (locking o (dec x))] res)))

So this is one reason not to use monitor-enter and monitor-exit
directly. Another reason is that locking guarantees that the monitor
will be released (by using try / finally, and of course by preventing
situations where the matching monitor-enter  monitor-exit operate on
different objects).

In fact, both monitor-enter and monitor-exit carry docstrings which
explicitly say that they should not be used in user code and point to
locking as the user-facing equivalent to Java's synchronized.

Cheers,
Michał


On 1 November 2013 19:34, Michael Blume blume.m...@gmail.com wrote:
 https://github.com/MichaelBlume/perf-test

 (ns perf-test
   (:use (criterium core))
   (:gen-class))

 (def o (Object.))

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (do
   (monitor-enter o)
   (let [res (dec x)]
 (monitor-exit 0)
 res

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (dec x)))

 (defn locking-part [x l]
   (monitor-enter l)
   (let [res (dec x)]
 (monitor-exit l)
 res))

 (defn baz [x]
   (if ( x 0)
 (inc x)
 (locking-part x o)))

 (defn -main []
   (println benching foo)
   (bench (foo 5) :verbose)
   (println benching bar)
   (bench (bar 5) :verbose)
   (println benching baz)
   (bench (baz 5) :verbose)
   (println done benching))



 I'm only ever calling these functions with positive values, so the
 monitor-enter branch should never be entered. Nevertheless, the performance
 of foo is much worse than bar or baz.

 The best guess I've got is that the fact that lock-taking is involved
 somehow changes how the function is compiled, somehow making the function
 slower. If the practical upshot is that I shouldn't write functions that
 only sometimes lock -- that the locking part of a function should always be
 its own function -- then I can do that, but I'm curious why.

 $ lein uberjar
 Compiling perf-test
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
 $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar
 benching foo
 WARNING: Final GC required 1.5974571326266802 % of runtime
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 391582560 in 60 samples of 6526376 calls.
   Execution time sample mean : 167.426696 ns
  Execution time mean : 167.459429 ns
 Execution time sample std-deviation : 4.079466 ns
 Execution time std-deviation : 4.097819 ns
Execution time lower quantile : 160.742869 ns ( 2.5%)
Execution time upper quantile : 175.453376 ns (97.5%)
Overhead used : 1.634996 ns

 Found 2 outliers in 60 samples (3. %)
 low-severe 2 (3. %)
  Variance from outliers : 12.5602 % Variance is moderately inflated by
 outliers
 benching bar
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2174037300 in 60 samples of 36233955 calls.
   Execution time sample mean : 26.068923 ns
  Execution time mean : 26.066422 ns
 Execution time sample std-deviation : 0.887937 ns
 Execution time std-deviation : 0.916861 ns
Execution time lower quantile : 23.996763 ns ( 2.5%)
Execution time upper quantile : 27.911936 ns (97.5%)
Overhead used : 1.634996 ns

 Found 3 outliers in 60 samples (5. %)
 low-severe 1 (1.6667 %)
 low-mild 1 (1.6667 %)
 high-mild 1 (1.6667 %)
  Variance from outliers : 22.1874 % Variance is moderately inflated by
 outliers
 benching baz
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2270676660 in 60 samples of 37844611 calls.
   Execution time sample mean : 25.834142 ns
  Execution time mean : 25.837429 ns
 Execution time sample std-deviation : 0.718382 ns
 Execution time std-deviation : 0.729431 ns
Execution time lower quantile : 24.837925 ns ( 2.5%)
Execution time upper quantile : 27.595781 ns (97.5%)
Overhead used : 1.634996 ns

 Found 4 outliers in 60 samples (6.6667 %)
 low-severe 2 (3. %)
 low-mild 2 (3. %)
  Variance from outliers : 15.7591 % Variance is moderately inflated by
 outliers
 done benching

 --
 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are

Re: Functions using locks are slowed even when locks are never taken

2013-11-03 Thread Michał Marczyk

I should perhaps make clear that with direct use of monitor-enter and
monitor-exit with a Var it's possible for monitor-enter and
monitor-exit to operate on different objects even in the absence of
typos, namely if somebody rebinds the Var. To illustrate this with
print at the REPL (a regular Clojure REPL, as opposed to nREPL or
similar, in case there are any issues with reproducing this in
different environments):

(def x 1)
(future (print x) (Thread/sleep 5000) (print x))
; prints 1 immediately, but then goes to sleep;
; in the meantime, I say
(def x 2)
@*2
; and after a moment 2 is printed

NB. this could also happen with alter-var-root or set! to a Var with a
thread-local binding.

Cheers,
Michał


On 3 November 2013 18:30, Michał Marczyk michal.marc...@gmail.com wrote:
 You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
 than o (the sentinel object).

 Besides that, in foo both monitor-enter and monitor-exit get their
 arguments from a Var. Rewriting to use locking, which first puts the
 object whose monitor will be used in a local (that is, (let [lockee o]
 ...), where ... performs the locking using the newly introduced
 local), gives timings identical to those of bar and baz:

 (defn foo' [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o (dec x))] res)))

 So this is one reason not to use monitor-enter and monitor-exit
 directly. Another reason is that locking guarantees that the monitor
 will be released (by using try / finally, and of course by preventing
 situations where the matching monitor-enter  monitor-exit operate on
 different objects).

 In fact, both monitor-enter and monitor-exit carry docstrings which
 explicitly say that they should not be used in user code and point to
 locking as the user-facing equivalent to Java's synchronized.

 Cheers,
 Michał


 On 1 November 2013 19:34, Michael Blume blume.m...@gmail.com wrote:
 https://github.com/MichaelBlume/perf-test

 (ns perf-test
   (:use (criterium core))
   (:gen-class))

 (def o (Object.))

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (do
   (monitor-enter o)
   (let [res (dec x)]
 (monitor-exit 0)
 res

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (dec x)))

 (defn locking-part [x l]
   (monitor-enter l)
   (let [res (dec x)]
 (monitor-exit l)
 res))

 (defn baz [x]
   (if ( x 0)
 (inc x)
 (locking-part x o)))

 (defn -main []
   (println benching foo)
   (bench (foo 5) :verbose)
   (println benching bar)
   (bench (bar 5) :verbose)
   (println benching baz)
   (bench (baz 5) :verbose)
   (println done benching))



 I'm only ever calling these functions with positive values, so the
 monitor-enter branch should never be entered. Nevertheless, the performance
 of foo is much worse than bar or baz.

 The best guess I've got is that the fact that lock-taking is involved
 somehow changes how the function is compiled, somehow making the function
 slower. If the practical upshot is that I shouldn't write functions that
 only sometimes lock -- that the locking part of a function should always be
 its own function -- then I can do that, but I'm curious why.

 $ lein uberjar
 Compiling perf-test
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
 $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar
 benching foo
 WARNING: Final GC required 1.5974571326266802 % of runtime
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 391582560 in 60 samples of 6526376 calls.
   Execution time sample mean : 167.426696 ns
  Execution time mean : 167.459429 ns
 Execution time sample std-deviation : 4.079466 ns
 Execution time std-deviation : 4.097819 ns
Execution time lower quantile : 160.742869 ns ( 2.5%)
Execution time upper quantile : 175.453376 ns (97.5%)
Overhead used : 1.634996 ns

 Found 2 outliers in 60 samples (3. %)
 low-severe 2 (3. %)
  Variance from outliers : 12.5602 % Variance is moderately inflated by
 outliers
 benching bar
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2174037300 in 60 samples of 36233955 calls.
   Execution time sample mean : 26.068923 ns
  Execution time mean : 26.066422 ns
 Execution time sample std-deviation : 0.887937 ns
 Execution time std-deviation : 0.916861 ns
Execution time lower quantile : 23.996763 ns ( 2.5%)
Execution time upper quantile : 27.911936 ns (97.5%)
Overhead used : 1.634996 ns

 Found 3 outliers in 60 samples (5. %)
 low-severe 1 (1.6667 %)
 low-mild 1 (1.6667 %)
 high-mild 1 (1.6667 %)
  Variance from outliers : 22.1874 % Variance is moderately inflated by
 outliers
 benching baz
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:

Re: Functions using locks are slowed even when locks are never taken

2013-11-03 Thread Andy Fingerhut

Good detective work, Michal.

So the extra time for the slower version was because a Var was always being
accessed in the generated byte code, even if the source code was only
explicitly accessing the Var in a branch that was never executed?

Thanks,
Andy


On Sun, Nov 3, 2013 at 9:30 AM, Michał Marczyk michal.marc...@gmail.comwrote:

 You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
 than o (the sentinel object).

 Besides that, in foo both monitor-enter and monitor-exit get their
 arguments from a Var. Rewriting to use locking, which first puts the
 object whose monitor will be used in a local (that is, (let [lockee o]
 ...), where ... performs the locking using the newly introduced
 local), gives timings identical to those of bar and baz:

 (defn foo' [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o (dec x))] res)))

 So this is one reason not to use monitor-enter and monitor-exit
 directly. Another reason is that locking guarantees that the monitor
 will be released (by using try / finally, and of course by preventing
 situations where the matching monitor-enter  monitor-exit operate on
 different objects).

 In fact, both monitor-enter and monitor-exit carry docstrings which
 explicitly say that they should not be used in user code and point to
 locking as the user-facing equivalent to Java's synchronized.

 Cheers,
 Michał


 On 1 November 2013 19:34, Michael Blume blume.m...@gmail.com wrote:
  https://github.com/MichaelBlume/perf-test
 
  (ns perf-test
(:use (criterium core))
(:gen-class))
 
  (def o (Object.))
 
  (defn foo [x]
(if ( x 0)
  (inc x)
  (do
(monitor-enter o)
(let [res (dec x)]
  (monitor-exit 0)
  res
 
  (defn bar [x]
(if ( x 0)
  (inc x)
  (dec x)))
 
  (defn locking-part [x l]
(monitor-enter l)
(let [res (dec x)]
  (monitor-exit l)
  res))
 
  (defn baz [x]
(if ( x 0)
  (inc x)
  (locking-part x o)))
 
  (defn -main []
(println benching foo)
(bench (foo 5) :verbose)
(println benching bar)
(bench (bar 5) :verbose)
(println benching baz)
(bench (baz 5) :verbose)
(println done benching))
 
 
 
  I'm only ever calling these functions with positive values, so the
  monitor-enter branch should never be entered. Nevertheless, the
 performance
  of foo is much worse than bar or baz.
 
  The best guess I've got is that the fact that lock-taking is involved
  somehow changes how the function is compiled, somehow making the function
  slower. If the practical upshot is that I shouldn't write functions that
  only sometimes lock -- that the locking part of a function should always
 be
  its own function -- then I can do that, but I'm curious why.
 
  $ lein uberjar
  Compiling perf-test
  Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
  Created
 /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
  $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar
  benching foo
  WARNING: Final GC required 1.5974571326266802 % of runtime
  x86_64 Mac OS X 10.8.3 4 cpu(s)
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28
  Runtime arguments:
  Evaluation count : 391582560 in 60 samples of 6526376 calls.
Execution time sample mean : 167.426696 ns
   Execution time mean : 167.459429 ns
  Execution time sample std-deviation : 4.079466 ns
  Execution time std-deviation : 4.097819 ns
 Execution time lower quantile : 160.742869 ns ( 2.5%)
 Execution time upper quantile : 175.453376 ns (97.5%)
 Overhead used : 1.634996 ns
 
  Found 2 outliers in 60 samples (3. %)
  low-severe 2 (3. %)
   Variance from outliers : 12.5602 % Variance is moderately inflated by
  outliers
  benching bar
  x86_64 Mac OS X 10.8.3 4 cpu(s)
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28
  Runtime arguments:
  Evaluation count : 2174037300 in 60 samples of 36233955 calls.
Execution time sample mean : 26.068923 ns
   Execution time mean : 26.066422 ns
  Execution time sample std-deviation : 0.887937 ns
  Execution time std-deviation : 0.916861 ns
 Execution time lower quantile : 23.996763 ns ( 2.5%)
 Execution time upper quantile : 27.911936 ns (97.5%)
 Overhead used : 1.634996 ns
 
  Found 3 outliers in 60 samples (5. %)
  low-severe 1 (1.6667 %)
  low-mild 1 (1.6667 %)
  high-mild 1 (1.6667 %)
   Variance from outliers : 22.1874 % Variance is moderately inflated by
  outliers
  benching baz
  x86_64 Mac OS X 10.8.3 4 cpu(s)
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28
  Runtime arguments:
  Evaluation count : 2270676660 in 60 samples of 37844611 calls.
Execution time sample mean : 25.834142 ns
   Execution time mean : 25.837429 ns
  Execution time sample std-deviation : 0.718382 ns
  Execution time std-deviation : 0.729431 ns
 Execution time lower quantile : 24.837925 ns ( 2.5%)
 Execution time upper

Re: Functions using locks are slowed even when locks are never taken

2013-11-03 Thread Michael Blume

Huh, interesting.

I have:

(defn foo' [x]
  (if ( x 0)
(inc x)
(let [res (locking o (dec x))] res)))

(defn foo'' [x]
  (if ( x 0)
(inc x)
(locking o
  (dec x

foo' is fast, but foo'' is slow. So something about wrapping the locking 
clause in a let makes it fast. Still no idea why.

On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote:

 You have a typo in foo -- monitor-exit's argument is 0 (zero) rather 
 than o (the sentinel object). 

 Besides that, in foo both monitor-enter and monitor-exit get their 
 arguments from a Var. Rewriting to use locking, which first puts the 
 object whose monitor will be used in a local (that is, (let [lockee o] 
 ...), where ... performs the locking using the newly introduced 
 local), gives timings identical to those of bar and baz: 

 (defn foo' [x] 
   (if ( x 0) 
 (inc x) 
 (let [res (locking o (dec x))] res))) 

 So this is one reason not to use monitor-enter and monitor-exit 
 directly. Another reason is that locking guarantees that the monitor 
 will be released (by using try / finally, and of course by preventing 
 situations where the matching monitor-enter  monitor-exit operate on 
 different objects). 

 In fact, both monitor-enter and monitor-exit carry docstrings which 
 explicitly say that they should not be used in user code and point to 
 locking as the user-facing equivalent to Java's synchronized. 

 Cheers, 
 Michał 


 On 1 November 2013 19:34, Michael Blume blume...@gmail.com javascript: 
 wrote: 
  https://github.com/MichaelBlume/perf-test 
  
  (ns perf-test 
(:use (criterium core)) 
(:gen-class)) 
  
  (def o (Object.)) 
  
  (defn foo [x] 
(if ( x 0) 
  (inc x) 
  (do 
(monitor-enter o) 
(let [res (dec x)] 
  (monitor-exit 0) 
  res 
  
  (defn bar [x] 
(if ( x 0) 
  (inc x) 
  (dec x))) 
  
  (defn locking-part [x l] 
(monitor-enter l) 
(let [res (dec x)] 
  (monitor-exit l) 
  res)) 
  
  (defn baz [x] 
(if ( x 0) 
  (inc x) 
  (locking-part x o))) 
  
  (defn -main [] 
(println benching foo) 
(bench (foo 5) :verbose) 
(println benching bar) 
(bench (bar 5) :verbose) 
(println benching baz) 
(bench (baz 5) :verbose) 
(println done benching)) 
  
  
  
  I'm only ever calling these functions with positive values, so the 
  monitor-enter branch should never be entered. Nevertheless, the 
 performance 
  of foo is much worse than bar or baz. 
  
  The best guess I've got is that the fact that lock-taking is involved 
  somehow changes how the function is compiled, somehow making the 
 function 
  slower. If the practical upshot is that I shouldn't write functions that 
  only sometimes lock -- that the locking part of a function should always 
 be 
  its own function -- then I can do that, but I'm curious why. 
  
  $ lein uberjar 
  Compiling perf-test 
  Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar 
  Created 
 /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
  $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
  benching foo 
  WARNING: Final GC required 1.5974571326266802 % of runtime 
  x86_64 Mac OS X 10.8.3 4 cpu(s) 
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28 
  Runtime arguments: 
  Evaluation count : 391582560 in 60 samples of 6526376 calls. 
Execution time sample mean : 167.426696 ns 
   Execution time mean : 167.459429 ns 
  Execution time sample std-deviation : 4.079466 ns 
  Execution time std-deviation : 4.097819 ns 
 Execution time lower quantile : 160.742869 ns ( 2.5%) 
 Execution time upper quantile : 175.453376 ns (97.5%) 
 Overhead used : 1.634996 ns 
  
  Found 2 outliers in 60 samples (3. %) 
  low-severe 2 (3. %) 
   Variance from outliers : 12.5602 % Variance is moderately inflated by 
  outliers 
  benching bar 
  x86_64 Mac OS X 10.8.3 4 cpu(s) 
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28 
  Runtime arguments: 
  Evaluation count : 2174037300 in 60 samples of 36233955 calls. 
Execution time sample mean : 26.068923 ns 
   Execution time mean : 26.066422 ns 
  Execution time sample std-deviation : 0.887937 ns 
  Execution time std-deviation : 0.916861 ns 
 Execution time lower quantile : 23.996763 ns ( 2.5%) 
 Execution time upper quantile : 27.911936 ns (97.5%) 
 Overhead used : 1.634996 ns 
  
  Found 3 outliers in 60 samples (5. %) 
  low-severe 1 (1.6667 %) 
  low-mild 1 (1.6667 %) 
  high-mild 1 (1.6667 %) 
   Variance from outliers : 22.1874 % Variance is moderately inflated by 
  outliers 
  benching baz 
  x86_64 Mac OS X 10.8.3 4 cpu(s) 
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28 
  Runtime arguments: 
  Evaluation count : 2270676660 in 60 samples of 37844611 calls. 
Execution time sample mean : 25.834142 ns 
   Execution time mean : 25.837429 ns

Re: Functions using locks are slowed even when locks are never taken

2013-11-03 Thread Michael Blume

I mean, I'm probably being naive, but this suggests that one could write

(defmacro locking' [ forms]
  `(let [res# (locking ~@forms)] res#))

and use locking' in place of locking for improved performance. Is this
wrong? If it's right, does that suggest the macro in clojure.core should be
changed?


On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com wrote:

 Huh, interesting.

 I have:

 (defn foo' [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o (dec x))] res)))

 (defn foo'' [x]
   (if ( x 0)
 (inc x)
 (locking o
   (dec x

 foo' is fast, but foo'' is slow. So something about wrapping the locking
 clause in a let makes it fast. Still no idea why.

 On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote:

 You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
 than o (the sentinel object).

 Besides that, in foo both monitor-enter and monitor-exit get their
 arguments from a Var. Rewriting to use locking, which first puts the
 object whose monitor will be used in a local (that is, (let [lockee o]
 ...), where ... performs the locking using the newly introduced
 local), gives timings identical to those of bar and baz:

 (defn foo' [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o (dec x))] res)))

 So this is one reason not to use monitor-enter and monitor-exit
 directly. Another reason is that locking guarantees that the monitor
 will be released (by using try / finally, and of course by preventing
 situations where the matching monitor-enter  monitor-exit operate on
 different objects).

 In fact, both monitor-enter and monitor-exit carry docstrings which
 explicitly say that they should not be used in user code and point to
 locking as the user-facing equivalent to Java's synchronized.

 Cheers,
 Michał


 On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote:
  https://github.com/MichaelBlume/perf-test
 
  (ns perf-test
(:use (criterium core))
(:gen-class))
 
  (def o (Object.))
 
  (defn foo [x]
(if ( x 0)
  (inc x)
  (do
(monitor-enter o)
(let [res (dec x)]
  (monitor-exit 0)
  res
 
  (defn bar [x]
(if ( x 0)
  (inc x)
  (dec x)))
 
  (defn locking-part [x l]
(monitor-enter l)
(let [res (dec x)]
  (monitor-exit l)
  res))
 
  (defn baz [x]
(if ( x 0)
  (inc x)
  (locking-part x o)))
 
  (defn -main []
(println benching foo)
(bench (foo 5) :verbose)
(println benching bar)
(bench (bar 5) :verbose)
(println benching baz)
(bench (baz 5) :verbose)
(println done benching))
 
 
 
  I'm only ever calling these functions with positive values, so the
  monitor-enter branch should never be entered. Nevertheless, the
 performance
  of foo is much worse than bar or baz.
 
  The best guess I've got is that the fact that lock-taking is involved
  somehow changes how the function is compiled, somehow making the
 function
  slower. If the practical upshot is that I shouldn't write functions
 that
  only sometimes lock -- that the locking part of a function should
 always be
  its own function -- then I can do that, but I'm curious why.
 
  $ lein uberjar
  Compiling perf-test
  Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
  Created 
  /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar

  $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar
  benching foo
  WARNING: Final GC required 1.5974571326266802 % of runtime
  x86_64 Mac OS X 10.8.3 4 cpu(s)
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28
  Runtime arguments:
  Evaluation count : 391582560 in 60 samples of 6526376 calls.
Execution time sample mean : 167.426696 ns
   Execution time mean : 167.459429 ns
  Execution time sample std-deviation : 4.079466 ns
  Execution time std-deviation : 4.097819 ns
 Execution time lower quantile : 160.742869 ns ( 2.5%)
 Execution time upper quantile : 175.453376 ns (97.5%)
 Overhead used : 1.634996 ns
 
  Found 2 outliers in 60 samples (3. %)
  low-severe 2 (3. %)
   Variance from outliers : 12.5602 % Variance is moderately inflated by
  outliers
  benching bar
  x86_64 Mac OS X 10.8.3 4 cpu(s)
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28
  Runtime arguments:
  Evaluation count : 2174037300 in 60 samples of 36233955 calls.
Execution time sample mean : 26.068923 ns
   Execution time mean : 26.066422 ns
  Execution time sample std-deviation : 0.887937 ns
  Execution time std-deviation : 0.916861 ns
 Execution time lower quantile : 23.996763 ns ( 2.5%)
 Execution time upper quantile : 27.911936 ns (97.5%)
 Overhead used : 1.634996 ns
 
  Found 3 outliers in 60 samples (5. %)
  low-severe 1 (1.6667 %)
  low-mild 1 (1.6667 %)
  high-mild 1 (1.6667 %)
   Variance from outliers : 22.1874 % Variance is moderately inflated by
  outliers
  benching baz
  x86_64 Mac OS

Re: Functions using locks are slowed even when locks are never taken

2013-11-03 Thread Michał Marczyk

Well, that is interesting.

The difference between the compiled versions of

(defn foo [x]
  (if ( x 0)
(inc x)
(locking o
  (dec x

and

(defn bar [x]
  (if ( x 0)
(inc x)
(let [res (locking o
(dec x))]
  res)))

is quite significant. foo gets compiled to a single class, with
invocations handled by a single invoke method; bar gets compiled to a
class for bar + an extra class for an inner function which handles the
(locking o (dec x)) part -- probably very similar to the output for
the version with the hand-coded locking-part (although I haven't
really looked at that yet). The inner function is a closure, so
calling it involves an allocation of a closure object; its ctor
receives the closed-over locals as arguments and stores them in two
fields (lockee and x). Then they get loaded from the fields in the
body of the closure's invoke method etc.

I guess I'll have to play around with Java equivalents too...

Cheers,
Michał


On 3 November 2013 20:46, Michael Blume blume.m...@gmail.com wrote:
 I mean, I'm probably being naive, but this suggests that one could write

 (defmacro locking' [ forms]
   `(let [res# (locking ~@forms)] res#))

 and use locking' in place of locking for improved performance. Is this
 wrong? If it's right, does that suggest the macro in clojure.core should be
 changed?


 On Sun, Nov 3, 2013 at 11:09 AM, Michael Blume blume.m...@gmail.com wrote:

 Huh, interesting.

 I have:

 (defn foo' [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o (dec x))] res)))

 (defn foo'' [x]
   (if ( x 0)
 (inc x)
 (locking o
   (dec x

 foo' is fast, but foo'' is slow. So something about wrapping the locking
 clause in a let makes it fast. Still no idea why.

 On Sunday, November 3, 2013 9:30:45 AM UTC-8, Michał Marczyk wrote:

 You have a typo in foo -- monitor-exit's argument is 0 (zero) rather
 than o (the sentinel object).

 Besides that, in foo both monitor-enter and monitor-exit get their
 arguments from a Var. Rewriting to use locking, which first puts the
 object whose monitor will be used in a local (that is, (let [lockee o]
 ...), where ... performs the locking using the newly introduced
 local), gives timings identical to those of bar and baz:

 (defn foo' [x]
   (if ( x 0)
 (inc x)
 (let [res (locking o (dec x))] res)))

 So this is one reason not to use monitor-enter and monitor-exit
 directly. Another reason is that locking guarantees that the monitor
 will be released (by using try / finally, and of course by preventing
 situations where the matching monitor-enter  monitor-exit operate on
 different objects).

 In fact, both monitor-enter and monitor-exit carry docstrings which
 explicitly say that they should not be used in user code and point to
 locking as the user-facing equivalent to Java's synchronized.

 Cheers,
 Michał


 On 1 November 2013 19:34, Michael Blume blume...@gmail.com wrote:
  https://github.com/MichaelBlume/perf-test
 
  (ns perf-test
(:use (criterium core))
(:gen-class))
 
  (def o (Object.))
 
  (defn foo [x]
(if ( x 0)
  (inc x)
  (do
(monitor-enter o)
(let [res (dec x)]
  (monitor-exit 0)
  res
 
  (defn bar [x]
(if ( x 0)
  (inc x)
  (dec x)))
 
  (defn locking-part [x l]
(monitor-enter l)
(let [res (dec x)]
  (monitor-exit l)
  res))
 
  (defn baz [x]
(if ( x 0)
  (inc x)
  (locking-part x o)))
 
  (defn -main []
(println benching foo)
(bench (foo 5) :verbose)
(println benching bar)
(bench (bar 5) :verbose)
(println benching baz)
(bench (baz 5) :verbose)
(println done benching))
 
 
 
  I'm only ever calling these functions with positive values, so the
  monitor-enter branch should never be entered. Nevertheless, the
  performance
  of foo is much worse than bar or baz.
 
  The best guess I've got is that the fact that lock-taking is involved
  somehow changes how the function is compiled, somehow making the
  function
  slower. If the practical upshot is that I shouldn't write functions
  that
  only sometimes lock -- that the locking part of a function should
  always be
  its own function -- then I can do that, but I'm curious why.
 
  $ lein uberjar
  Compiling perf-test
  Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
  Created
  /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
  $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar
  benching foo
  WARNING: Final GC required 1.5974571326266802 % of runtime
  x86_64 Mac OS X 10.8.3 4 cpu(s)
  Java HotSpot(TM) 64-Bit Server VM 24.0-b28
  Runtime arguments:
  Evaluation count : 391582560 in 60 samples of 6526376 calls.
Execution time sample mean : 167.426696 ns
   Execution time mean : 167.459429 ns
  Execution time sample std-deviation : 4.079466 ns
  Execution time std-deviation : 4.097819 ns
 Execution time lower quantile : 160.742869

Re: Functions using locks are slowed even when locks are never taken

2013-11-02 Thread Trenton Strong

Verified that I receive the same result patterns as you on my machine.

One other possibility outside of what you have already mentioned would be 
something JIT-related.  Perhaps there is an optimization that can be 
performed if the locking sections of the code are in another function but 
otherwise no, but I'm not sure how that dovetails with Clojure' fn 
compilation.

On Friday, November 1, 2013 11:53:12 AM UTC-7, Michael Blume wrote:

 Since 1.6 alpha is out, I reran the tests with that -- basically the same 
 results.

 On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote:

 https://github.com/MichaelBlume/perf-test

 (ns perf-test
   (:use (criterium core))
   (:gen-class))

 (def o (Object.))

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (do
   (monitor-enter o)
   (let [res (dec x)]
 (monitor-exit 0)
 res

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (dec x)))

 (defn locking-part [x l]
   (monitor-enter l)
   (let [res (dec x)]
 (monitor-exit l)
 res))

 (defn baz [x]
   (if ( x 0)
 (inc x)
 (locking-part x o)))

 (defn -main []
   (println benching foo)
   (bench (foo 5) :verbose) 
   (println benching bar)
   (bench (bar 5) :verbose)
   (println benching baz)
   (bench (baz 5) :verbose)
   (println done benching))



 I'm only ever calling these functions with positive values, so the 
 monitor-enter branch should never be entered. Nevertheless, the performance 
 of foo is much worse than bar or baz.

 The best guess I've got is that the fact that lock-taking is involved 
 somehow changes how the function is compiled, somehow making the function 
 slower. If the practical upshot is that I shouldn't write functions that 
 only sometimes lock -- that the locking part of a function should always be 
 its own function -- then I can do that, but I'm curious why.

 $ lein uberjar
 Compiling perf-test
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
 $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
 benching foo
 WARNING: Final GC required 1.5974571326266802 % of runtime
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 391582560 in 60 samples of 6526376 calls.
   Execution time sample mean : 167.426696 ns
  Execution time mean : 167.459429 ns
 Execution time sample std-deviation : 4.079466 ns
 Execution time std-deviation : 4.097819 ns
Execution time lower quantile : 160.742869 ns ( 2.5%)
Execution time upper quantile : 175.453376 ns (97.5%)
Overhead used : 1.634996 ns

 Found 2 outliers in 60 samples (3. %)
  low-severe   2 (3. %)
  Variance from outliers : 12.5602 % Variance is moderately inflated by 
 outliers
 benching bar
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2174037300 in 60 samples of 36233955 calls.
   Execution time sample mean : 26.068923 ns
  Execution time mean : 26.066422 ns
 Execution time sample std-deviation : 0.887937 ns
 Execution time std-deviation : 0.916861 ns
Execution time lower quantile : 23.996763 ns ( 2.5%)
Execution time upper quantile : 27.911936 ns (97.5%)
Overhead used : 1.634996 ns

 Found 3 outliers in 60 samples (5. %)
  low-severe   1 (1.6667 %)
  low-mild 1 (1.6667 %)
  high-mild1 (1.6667 %)
  Variance from outliers : 22.1874 % Variance is moderately inflated by 
 outliers
 benching baz
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2270676660 in 60 samples of 37844611 calls.
   Execution time sample mean : 25.834142 ns
  Execution time mean : 25.837429 ns
 Execution time sample std-deviation : 0.718382 ns
 Execution time std-deviation : 0.729431 ns
Execution time lower quantile : 24.837925 ns ( 2.5%)
Execution time upper quantile : 27.595781 ns (97.5%)
Overhead used : 1.634996 ns

 Found 4 outliers in 60 samples (6.6667 %)
  low-severe   2 (3. %)
  low-mild 2 (3. %)
  Variance from outliers : 15.7591 % Variance is moderately inflated by 
 outliers
 done benching



-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to

Re: Functions using locks are slowed even when locks are never taken

2013-11-02 Thread Michael Blume

Hmm, it seems like if it were JIT related you'd see the same issue with 
java code, but I get 5ns across the board.

https://github.com/MichaelBlume/perf-test/blob/master/src-java/PerfTest.java

On Saturday, November 2, 2013 12:01:01 AM UTC-7, Trenton Strong wrote:

 Verified that I receive the same result patterns as you on my machine.

 One other possibility outside of what you have already mentioned would be 
 something JIT-related.  Perhaps there is an optimization that can be 
 performed if the locking sections of the code are in another function but 
 otherwise no, but I'm not sure how that dovetails with Clojure' fn 
 compilation.

 On Friday, November 1, 2013 11:53:12 AM UTC-7, Michael Blume wrote:

 Since 1.6 alpha is out, I reran the tests with that -- basically the same 
 results.

 On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote:

 https://github.com/MichaelBlume/perf-test

 (ns perf-test
   (:use (criterium core))
   (:gen-class))

 (def o (Object.))

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (do
   (monitor-enter o)
   (let [res (dec x)]
 (monitor-exit 0)
 res

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (dec x)))

 (defn locking-part [x l]
   (monitor-enter l)
   (let [res (dec x)]
 (monitor-exit l)
 res))

 (defn baz [x]
   (if ( x 0)
 (inc x)
 (locking-part x o)))

 (defn -main []
   (println benching foo)
   (bench (foo 5) :verbose) 
   (println benching bar)
   (bench (bar 5) :verbose)
   (println benching baz)
   (bench (baz 5) :verbose)
   (println done benching))



 I'm only ever calling these functions with positive values, so the 
 monitor-enter branch should never be entered. Nevertheless, the performance 
 of foo is much worse than bar or baz.

 The best guess I've got is that the fact that lock-taking is involved 
 somehow changes how the function is compiled, somehow making the function 
 slower. If the practical upshot is that I shouldn't write functions that 
 only sometimes lock -- that the locking part of a function should always be 
 its own function -- then I can do that, but I'm curious why.

 $ lein uberjar
 Compiling perf-test
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
 $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
 benching foo
 WARNING: Final GC required 1.5974571326266802 % of runtime
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 391582560 in 60 samples of 6526376 calls.
   Execution time sample mean : 167.426696 ns
  Execution time mean : 167.459429 ns
 Execution time sample std-deviation : 4.079466 ns
 Execution time std-deviation : 4.097819 ns
Execution time lower quantile : 160.742869 ns ( 2.5%)
Execution time upper quantile : 175.453376 ns (97.5%)
Overhead used : 1.634996 ns

 Found 2 outliers in 60 samples (3. %)
 low-severe   2 (3. %)
  Variance from outliers : 12.5602 % Variance is moderately inflated by 
 outliers
 benching bar
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2174037300 in 60 samples of 36233955 calls.
   Execution time sample mean : 26.068923 ns
  Execution time mean : 26.066422 ns
 Execution time sample std-deviation : 0.887937 ns
 Execution time std-deviation : 0.916861 ns
Execution time lower quantile : 23.996763 ns ( 2.5%)
Execution time upper quantile : 27.911936 ns (97.5%)
Overhead used : 1.634996 ns

 Found 3 outliers in 60 samples (5. %)
 low-severe   1 (1.6667 %)
 low-mild 1 (1.6667 %)
 high-mild1 (1.6667 %)
  Variance from outliers : 22.1874 % Variance is moderately inflated by 
 outliers
 benching baz
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2270676660 in 60 samples of 37844611 calls.
   Execution time sample mean : 25.834142 ns
  Execution time mean : 25.837429 ns
 Execution time sample std-deviation : 0.718382 ns
 Execution time std-deviation : 0.729431 ns
Execution time lower quantile : 24.837925 ns ( 2.5%)
Execution time upper quantile : 27.595781 ns (97.5%)
Overhead used : 1.634996 ns

 Found 4 outliers in 60 samples (6.6667 %)
 low-severe   2 (3. %)
 low-mild 2 (3. %)
  Variance from outliers : 15.7591 % Variance is moderately inflated by 
 outliers
 done benching



-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For

Functions using locks are slowed even when locks are never taken

2013-11-01 Thread Michael Blume

https://github.com/MichaelBlume/perf-test

(ns perf-test
  (:use (criterium core))
  (:gen-class))

(def o (Object.))

(defn foo [x]
  (if ( x 0)
(inc x)
(do
  (monitor-enter o)
  (let [res (dec x)]
(monitor-exit 0)
res

(defn bar [x]
  (if ( x 0)
(inc x)
(dec x)))

(defn locking-part [x l]
  (monitor-enter l)
  (let [res (dec x)]
(monitor-exit l)
res))

(defn baz [x]
  (if ( x 0)
(inc x)
(locking-part x o)))

(defn -main []
  (println benching foo)
  (bench (foo 5) :verbose) 
  (println benching bar)
  (bench (bar 5) :verbose)
  (println benching baz)
  (bench (baz 5) :verbose)
  (println done benching))



I'm only ever calling these functions with positive values, so the 
monitor-enter branch should never be entered. Nevertheless, the performance of 
foo is much worse than bar or baz.

The best guess I've got is that the fact that lock-taking is involved somehow 
changes how the function is compiled, somehow making the function slower. If 
the practical upshot is that I shouldn't write functions that only sometimes 
lock -- that the locking part of a function should always be its own function 
-- then I can do that, but I'm curious why.

$ lein uberjar
Compiling perf-test
Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
$ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
benching foo
WARNING: Final GC required 1.5974571326266802 % of runtime
x86_64 Mac OS X 10.8.3 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 24.0-b28
Runtime arguments:
Evaluation count : 391582560 in 60 samples of 6526376 calls.
  Execution time sample mean : 167.426696 ns
 Execution time mean : 167.459429 ns
Execution time sample std-deviation : 4.079466 ns
Execution time std-deviation : 4.097819 ns
   Execution time lower quantile : 160.742869 ns ( 2.5%)
   Execution time upper quantile : 175.453376 ns (97.5%)
   Overhead used : 1.634996 ns

Found 2 outliers in 60 samples (3. %)
low-severe   2 (3. %)
 Variance from outliers : 12.5602 % Variance is moderately inflated by outliers
benching bar
x86_64 Mac OS X 10.8.3 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 24.0-b28
Runtime arguments:
Evaluation count : 2174037300 in 60 samples of 36233955 calls.
  Execution time sample mean : 26.068923 ns
 Execution time mean : 26.066422 ns
Execution time sample std-deviation : 0.887937 ns
Execution time std-deviation : 0.916861 ns
   Execution time lower quantile : 23.996763 ns ( 2.5%)
   Execution time upper quantile : 27.911936 ns (97.5%)
   Overhead used : 1.634996 ns

Found 3 outliers in 60 samples (5. %)
low-severe   1 (1.6667 %)
low-mild 1 (1.6667 %)
high-mild1 (1.6667 %)
 Variance from outliers : 22.1874 % Variance is moderately inflated by outliers
benching baz
x86_64 Mac OS X 10.8.3 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 24.0-b28
Runtime arguments:
Evaluation count : 2270676660 in 60 samples of 37844611 calls.
  Execution time sample mean : 25.834142 ns
 Execution time mean : 25.837429 ns
Execution time sample std-deviation : 0.718382 ns
Execution time std-deviation : 0.729431 ns
   Execution time lower quantile : 24.837925 ns ( 2.5%)
   Execution time upper quantile : 27.595781 ns (97.5%)
   Overhead used : 1.634996 ns

Found 4 outliers in 60 samples (6.6667 %)
low-severe   2 (3. %)
low-mild 2 (3. %)
 Variance from outliers : 15.7591 % Variance is moderately inflated by outliers
done benching

-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Functions using locks are slowed even when locks are never taken

2013-11-01 Thread Michael Blume

Since 1.6 alpha is out, I reran the tests with that -- basically the same 
results.

On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote:

 https://github.com/MichaelBlume/perf-test

 (ns perf-test
   (:use (criterium core))
   (:gen-class))

 (def o (Object.))

 (defn foo [x]
   (if ( x 0)
 (inc x)
 (do
   (monitor-enter o)
   (let [res (dec x)]
 (monitor-exit 0)
 res

 (defn bar [x]
   (if ( x 0)
 (inc x)
 (dec x)))

 (defn locking-part [x l]
   (monitor-enter l)
   (let [res (dec x)]
 (monitor-exit l)
 res))

 (defn baz [x]
   (if ( x 0)
 (inc x)
 (locking-part x o)))

 (defn -main []
   (println benching foo)
   (bench (foo 5) :verbose) 
   (println benching bar)
   (bench (bar 5) :verbose)
   (println benching baz)
   (bench (baz 5) :verbose)
   (println done benching))



 I'm only ever calling these functions with positive values, so the 
 monitor-enter branch should never be entered. Nevertheless, the performance 
 of foo is much worse than bar or baz.

 The best guess I've got is that the fact that lock-taking is involved somehow 
 changes how the function is compiled, somehow making the function slower. If 
 the practical upshot is that I shouldn't write functions that only sometimes 
 lock -- that the locking part of a function should always be its own function 
 -- then I can do that, but I'm curious why.

 $ lein uberjar
 Compiling perf-test
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
 Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
 $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
 benching foo
 WARNING: Final GC required 1.5974571326266802 % of runtime
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 391582560 in 60 samples of 6526376 calls.
   Execution time sample mean : 167.426696 ns
  Execution time mean : 167.459429 ns
 Execution time sample std-deviation : 4.079466 ns
 Execution time std-deviation : 4.097819 ns
Execution time lower quantile : 160.742869 ns ( 2.5%)
Execution time upper quantile : 175.453376 ns (97.5%)
Overhead used : 1.634996 ns

 Found 2 outliers in 60 samples (3. %)
   low-severe   2 (3. %)
  Variance from outliers : 12.5602 % Variance is moderately inflated by 
 outliers
 benching bar
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2174037300 in 60 samples of 36233955 calls.
   Execution time sample mean : 26.068923 ns
  Execution time mean : 26.066422 ns
 Execution time sample std-deviation : 0.887937 ns
 Execution time std-deviation : 0.916861 ns
Execution time lower quantile : 23.996763 ns ( 2.5%)
Execution time upper quantile : 27.911936 ns (97.5%)
Overhead used : 1.634996 ns

 Found 3 outliers in 60 samples (5. %)
   low-severe   1 (1.6667 %)
   low-mild 1 (1.6667 %)
   high-mild1 (1.6667 %)
  Variance from outliers : 22.1874 % Variance is moderately inflated by 
 outliers
 benching baz
 x86_64 Mac OS X 10.8.3 4 cpu(s)
 Java HotSpot(TM) 64-Bit Server VM 24.0-b28
 Runtime arguments:
 Evaluation count : 2270676660 in 60 samples of 37844611 calls.
   Execution time sample mean : 25.834142 ns
  Execution time mean : 25.837429 ns
 Execution time sample std-deviation : 0.718382 ns
 Execution time std-deviation : 0.729431 ns
Execution time lower quantile : 24.837925 ns ( 2.5%)
Execution time upper quantile : 27.595781 ns (97.5%)
Overhead used : 1.634996 ns

 Found 4 outliers in 60 samples (6.6667 %)
   low-severe   2 (3. %)
   low-mild 2 (3. %)
  Variance from outliers : 15.7591 % Variance is moderately inflated by 
 outliers
 done benching



-- 
-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

Functions using locks are slowed even when locks are never taken

Re: Functions using locks are slowed even when locks are never taken

14 matches

Site Navigation

Mail list logo

Footer information