Re: Functions using locks are slowed even when locks are never taken

2013-11-02 Thread Trenton Strong
Verified that I receive the same result patterns as you on my machine.

One other possibility outside of what you have already mentioned would be 
something JIT-related.  Perhaps there is an optimization that can be 
performed if the locking sections of the code are in another function but 
otherwise no, but I'm not sure how that dovetails with Clojure' fn 
compilation.

On Friday, November 1, 2013 11:53:12 AM UTC-7, Michael Blume wrote:
>
> Since 1.6 alpha is out, I reran the tests with that -- basically the same 
> results.
>
> On Friday, November 1, 2013 11:34:15 AM UTC-7, Michael Blume wrote:
>>
>> https://github.com/MichaelBlume/perf-test
>>
>> (ns perf-test
>>   (:use (criterium core))
>>   (:gen-class))
>>
>> (def o (Object.))
>>
>> (defn foo [x]
>>   (if (> x 0)
>> (inc x)
>> (do
>>   (monitor-enter o)
>>   (let [res (dec x)]
>> (monitor-exit 0)
>> res
>>
>> (defn bar [x]
>>   (if (> x 0)
>> (inc x)
>> (dec x)))
>>
>> (defn locking-part [x l]
>>   (monitor-enter l)
>>   (let [res (dec x)]
>> (monitor-exit l)
>> res))
>>
>> (defn baz [x]
>>   (if (> x 0)
>> (inc x)
>> (locking-part x o)))
>>
>> (defn -main []
>>   (println "benching foo")
>>   (bench (foo 5) :verbose) 
>>   (println "benching bar")
>>   (bench (bar 5) :verbose)
>>   (println "benching baz")
>>   (bench (baz 5) :verbose)
>>   (println "done benching"))
>>
>>
>>
>> I'm only ever calling these functions with positive values, so the 
>> monitor-enter branch should never be entered. Nevertheless, the performance 
>> of foo is much worse than bar or baz.
>>
>> The best guess I've got is that the fact that lock-taking is involved 
>> somehow changes how the function is compiled, somehow making the function 
>> slower. If the practical upshot is that I shouldn't write functions that 
>> only sometimes lock -- that the locking part of a function should always be 
>> its own function -- then I can do that, but I'm curious why.
>>
>> $ lein uberjar
>> Compiling perf-test
>> Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT.jar
>> Created /Users/mike/perf-test/target/perf-test-0.1.0-SNAPSHOT-standalone.jar
>> $ java -jar -server target/perf-test-0.1.0-SNAPSHOT-standalone.jar 
>> benching foo
>> WARNING: Final GC required 1.5974571326266802 % of runtime
>> x86_64 Mac OS X 10.8.3 4 cpu(s)
>> Java HotSpot(TM) 64-Bit Server VM 24.0-b28
>> Runtime arguments:
>> Evaluation count : 391582560 in 60 samples of 6526376 calls.
>>   Execution time sample mean : 167.426696 ns
>>  Execution time mean : 167.459429 ns
>> Execution time sample std-deviation : 4.079466 ns
>> Execution time std-deviation : 4.097819 ns
>>Execution time lower quantile : 160.742869 ns ( 2.5%)
>>Execution time upper quantile : 175.453376 ns (97.5%)
>>Overhead used : 1.634996 ns
>>
>> Found 2 outliers in 60 samples (3. %)
>>  low-severe   2 (3. %)
>>  Variance from outliers : 12.5602 % Variance is moderately inflated by 
>> outliers
>> benching bar
>> x86_64 Mac OS X 10.8.3 4 cpu(s)
>> Java HotSpot(TM) 64-Bit Server VM 24.0-b28
>> Runtime arguments:
>> Evaluation count : 2174037300 in 60 samples of 36233955 calls.
>>   Execution time sample mean : 26.068923 ns
>>  Execution time mean : 26.066422 ns
>> Execution time sample std-deviation : 0.887937 ns
>> Execution time std-deviation : 0.916861 ns
>>Execution time lower quantile : 23.996763 ns ( 2.5%)
>>Execution time upper quantile : 27.911936 ns (97.5%)
>>Overhead used : 1.634996 ns
>>
>> Found 3 outliers in 60 samples (5. %)
>>  low-severe   1 (1.6667 %)
>>  low-mild 1 (1.6667 %)
>>  high-mild1 (1.6667 %)
>>  Variance from outliers : 22.1874 % Variance is moderately inflated by 
>> outliers
>> benching baz
>> x86_64 Mac OS X 10.8.3 4 cpu(s)
>> Java HotSpot(TM) 64-Bit Server VM 24.0-b28
>> Runtime arguments:
>> Evaluation count : 2270676660 in 60 samples of 37844611 calls.
>>   Execution time sample mean : 25.834142 ns
>>  Execution time mean : 25.837429 ns
>> Execution time sample std-deviation : 0.718382 ns
>> Execution time std-deviation : 0.729431 ns
>>Execution time lower quantile : 24.837925 ns ( 2.5%)
>>Execution time upper quantile : 27.595781 ns (97.5%)
>>Overhead used : 1.634996 ns
>>
>> Found 4 outliers in 60 samples (6.6667 %)
>>  low-severe   2 (3. %)
>>  low-mild 2 (3. %)
>>  Variance from outliers : 15.7591 % Variance is moderately inflated by 
>> outliers
>> done benching
>>
>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visi

Re: Question regarding structural sharing, records, and maps

2011-07-29 Thread Trenton Strong
On Jul 29, 4:38 am, Ken Wesson  wrote:
> Yes. In fact this should work with records, too -- all the "native"
> fields of the record need to be copied for each version, but most will
> typically be pointers and most of those will typically be to existing
> objects. Only the ones that were changed will point to new nested
> objects. So unless your records have huge numbers of predefined fields
> there shouldn't be much time or memory cost in copying them.

Thanks for clearing that up.  I wasn't exactly sure how the memory
model of records worked.  The semantics that records provide seem like
a good fit for applications like this.

> Probably. Any two changes to the tree will cause one to retry in the
> single-ref case, unless you're sure all the changes will commute.
> (Changes to distinct fields where one change doesn't depend on what
> the other changes will commute, and changes to the same field that
> don't matter as to sequencing, such as inc and dec, will commute, but
> otherwise no.)

Yeah, tree updates are pretty much strictly non-commutative.  The
retry overhead will probably grow quite fast with the number of
concurrent updates.

>
> You will probably want something structured more like a database, with
> "tables" that are maps and "rows" that are records or more complex,
> nested structures, such that most concurrent changes will not affect a
> common row. The "tables" can then be refs of maps of refs, with most
> changes operating on the row-refs. The table refs only need to change
> if rows are inserted or deleted; those could be bottlenecks, though if
> all objects have GUIDs as keys in these tables then row insertions and
> deletions will commute, leaving only the GUID generators for each
> table.

I think I understand the structure here, but just to be clear, you're
proposing flattening
of the tree structure from something like this:

(ref
{
  :contents (some stuff),

  :child-node1 {
:contents (other things)
:child-node1 ...
  },

  :child-node2 {
:contents (thingy1 thingy2)
...
  }
  ...
})

into something like this?

(ref
{
  :some-guid  (ref {<<< could be a map, record, or what have
you
:contents (some stuff)
:child-node1 :another-guid
:child-node2 :and-another-guid
  })

  :another-guid (ref {
:contents (other things)
...
  })

  :and-another-guid (ref {
:contents (thingy1 thingy2)
...
  })
}

Interesting.  I feel like I should have thought of this since I've
done something similar in more object oriented code, but riddled with
lots of fine grained locking.  Here if updating one row references
another
the transaction is going to handle the snapshotting.  I begin to
see...

>
> Unfortunately, (let [new-id (commute table-x-guid-counter inc)])
> sounds nice but might result in new-id taking on the same value in two
> concurrent transactions, so you'll need to use alter and GUID
> generation may be a bottleneck. But it may be much faster than your
> other, more complex transactions. Be sure to do it separately, e.g.:
>
> (let [new-id (dosync (alter table-x-guid-counter inc))]
>   (dosync
>     (commute table-x assoc new-id some-thingy)))
>
> You can even use atoms for the guid counters instead of refs, and then
> the ! in swap! will remind you to get the ID before entering a
> transaction. Do as much of what's needed to generate some-thingy
> before the transaction, too -- though if it depends on other refs'
> values, and those values should be current rather than it being
> acceptable for them to be values that were there recently but might
> have changed, then some of the work will need to be done in the
> transaction.
>
> > I've considered wrapping each node in
> > a ref and performing updates that way, or possibly taking the async
> > route and dispatching cals using agents, but my experience there is
> > nil.  Any insight there?
>
> I'd avoid having refs nesting more than two deep. Complex arrangements
> of cross-references among nodes can create a giant headache in
> combination with refs and transactions, so using a "database" approach
> with "table" maps, ID token keys, and cross-references to ID tokens
> rather than objects is often sensible. In particular, complex objects
> with nested refs effectively have mutable state inside them, which is
> usually not desired in Clojure. (One can argue, though, that the ref
> object is not different in principle from the ID token and removes a
> layer of indirection, and possibly obviates the need to generate IDs
> from counters that act as bottlenecks -- though, really, the Java new
> operator's allocation of memory now takes the role of ID generator.
> I'm given to understand modern VMs do some clever things to make new
> able to be fairly concurrent, though, such as giving each thread its
> own subset of the eden generation to allocate in.)

Makes sense.  I need to look into generating GUIDs quickly in a
concurrent manner for other parts
of the application anyways.  I'm sure if this become

Re: Question regarding structural sharing, records, and maps

2011-07-29 Thread Trenton Strong
On Jul 29, 8:58 am, Michael Gardner  wrote:
>
> An atom would be more appropriate, unless you're coordinating updates to the 
> tree with updates to some other ref.

Ah, for some reason that didn't occur to me, thanks.  Since the tree
in this case acts as a supporting index on spatial data, it pretty
much defers to the outcome of any updates to the original entities.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Question regarding structural sharing, records, and maps

2011-07-29 Thread Trenton Strong
Hi all, this is my first post.

I'm currently building a tree data structure (quadtree to be exact)
while doing some exploratory game server programming using Clojure.
This tree will end up with a lot of "entities" stored in the nodes,
whose positions determine which node they end up in.  As game state
progresses, updates will come from player and AI controlled entities
which need to be reflected in the tree.

Since player updates will come from any number of external clients,
the tree will need some form of concurrency management.  So far, I
have settled on just using a single ref wrapping the root node of the
tree with all update functions acting on that ref via dosync.

My question is one of implementation, as I don't yet have a good
feeling how different approaches to storing data will affect memory
usage and performance in the long run and I'm trying to better my
intuition for designing applications with Clojure.  I am currently
using maps to represent the nodes of the tree, with the hypothesis
being that new versions of the tree will save memory and do less
thrashing this way.  I'm aware that Clojure's persistent data
structures have some method of structural sharing between different
states of the same identity, which is what leads me to that guess.  I
haven't found anything that conclusively says that structures defined
with defrecord do not perform this sort of sharing, but from what I
can tell it seems like they do not, which makes sense given that they
provide accessors and expose a Java class.

So, some trade-off questions would be:

1.) Does structural sharing play well with nested structures? With a
tree whose nodes are represented by nested maps, if a leaf node is
updated with new data, will structural sharing efficiently represent
the new version of the tree as "all the rest of the tree" + modified
node?

2.) If one were to use nested records instead, I assume producing a
new version of the tree will result in a cloning operation over the
nodes of the tree, less the nodes we are modifying ourselves.  If
updates were being performed very frequently, is there a possibility
of adverse interaction with a GC?  I know that GC implementations
differ, but it seems like this could put a lot of pressure on a
typical GC.

3.) Are there alternatives to a single ref around the root node of the
tree that are worth exploring?  I've considered wrapping each node in
a ref and performing updates that way, or possibly taking the async
route and dispatching cals using agents, but my experience there is
nil.  Any insight there?

As an aside, I am not looking for specific answers as without numbers
or hard data there really can't be any right now.  I will end up
measuring the performance of these approaches, but I do want to have
some understanding of how the variables are related when trying to
understand the empirical data.

Thanks,

Trent


-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en