A value can have more than one form or implementation. For example, a 
String and a char array may both represent the same value. Most objects, 
for example, can be serialized. Changing the form of a value then is not a 
change of value. This is the idea behind lazy deserialization / 
reserialization.


We can represent a data structure as a tree, where the values in the tree 
can be either the immutable value of the object or a read-only ByteBuffer 
holding the serialized form of the value, or both. Both forms can be held 
by nodes using atoms, so the form of the values can be changed without 
having to create a new tree. The value remains unchanged and the tree can 
be safely shared between threads.


Lazy deserialization comes after loading a map or vector from a ByteBuffer. 
Initially only the root exists, which holds only the ByteBuffer. As you 
access and alter the contents of the map or vector, the portions of the 
tree which are accessed are deserialized.


Incremental reserialization then occurs when writing a map or vector back 
to a ByteBuffer. Unaltered portions of the tree need not be serialized, as 
a reference to the ByteBuffer that the structure was loaded from is 
retained.


Currently only the top-level vector or map is lazy. The contents of the 
structure is serialized/deserialized using prn-str/read-string. Read-string 
options, if needed, can be passed in an opts map (i.e. resources map) which 
is the last argument of the lazy-load method.


The benchmark for map illustrates this:


(ns aatree.map-updates
  (:require [aatree.core :refer :all])
  (:import (java.nio ByteBuffer)))

(def map-size 1000000)
(def updates 1000)

(defn bld [m i]
  (conj m [i i]))

(println)
(def t0 (System/currentTimeMillis))
(def lazy-map (reduce bld emptyLazyAAMap (range map-size)))
(def t1 (System/currentTimeMillis))
(def micr-0 (* 1000. (- t1 t0)))
(println "Time to build a map of size" map-size "=" micr-0 "microseconds")
(println "Time per entry:" (/ micr-0 map-size) "microseconds")

(defn upd [m i]
  (let [m1 (assoc m i (- i))
        bb (ByteBuffer/allocate (lazy-byte-length m1))]
    (lazy-write m1 bb)
    (.flip bb)
    (load-aamap bb)))

(println)
(def t0 (System/currentTimeMillis))
(def lazy-m (reduce upd lazy-map (range updates)))
(def t1 (System/currentTimeMillis))
(def micr-0 (* 1000. (- t1 t0)))
(println "Time to deserialize/update/reserialize " updates "times =" micr-0 
"microseconds")
(println "Time per complete update:" (/ micr-0 updates) "microseconds")

(println)




On Thursday, October 1, 2015 at 3:16:43 PM UTC-4, Nathan Davis wrote:
>
> Hi Bill,
>
> This sounds interesting, but the descriptions here and on the project page 
> are pretty high-level.  Are there more details available?  One question I 
> have in particular is, what do you mean by "update"?  From what I can tell, 
> aatree provides an immutable/persistent data structure compatible with 
> Clojure maps.  But then you say that serializing "updates" is done 
> incrementally.  Could you elaborate on this?  Do you mean that the new copy 
> to be serialized is basically diff'd agaist the old version, and just the 
> difference is serialized?
>
> Thanks,
>
> Nathan Davis
>
> On Monday, September 28, 2015 at 7:39:28 PM UTC-5, William la Forge wrote:
>>
>> Too often when dealing with data saved on disk you find yourself having 
>> to deserialize a large block of data, make a small update, and then 
>> reserialize the whole thing and write it back to disk. Deserialization and 
>> reserialization are quite slow, so having to deserialize and reserialize a 
>> large block of data for each small update creates a bottleneck. In this 
>> case, incremental deserialization / reserialization will be a substantial 
>> improvement.
>>
>> Release 0.3.0 provides lazy aa maps and lazy aa vectors, which can be 
>> used as the top-level container for large blocks of durable data. 
>> Serialized data is binary and organized in a binary tree. To access an 
>> entry, only the nodes in the path from the root to the node of interested 
>> must be deserialized. And only the updated nodes (and their parent nodes) 
>> need to be subsequently reserialized. So the speed improvement is 
>> substantial. (Benchmark code is provided)
>>
>> As with previous releases, collection-check 
>> <https://github.com/ztellman/collection-check> is used to validate the 
>> vector and map implementations.
>>
>> https://github.com/laforge49/aatree#readme
>>
>> Your feedback here is greatly appreciated!
>>
>> Bill
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to