Hi Stefan,

The append only cluster roughly speaking means that all changes are append
only deletes/updates and additions of course are append only.
But cluster consist of logical segments and those segments they are
defragmented in background and their space is reused.

So lets suppose we have cluster file with lets say 3 segments each 150MB
(actually a lot of data).
then we have situation like:

1-st segment is empty.
2-nd segment is empty.


We do creations, updates deletes and so on.
so we have:
1-st segment is full
2-nd segment is half empty

In background we do 1-st segment defragmentation. we load 1-st segment in
memory then extract only useful data (drop all out of dated after updates
and deletes) and put in 2-nd segment.
So as result we have:

1-st segment is empty
2-nd segment is full

and we start to work(add data to) with 1-st segment once again.

So virtually you always append data.
It gives following advantages:
1. you work without random i/o (only small fraction of operations will be
suffer from random io).
2. More scalable from mulitthreading point of view , you append only data
so reads do not compete with writes.

>From user perspective all operations are supported.




On Tue, Jan 21, 2014 at 7:03 PM, <[email protected]> wrote:

> Hi,
>
> I'm a bit curious on the "append only cluster" as append-only is a part
> of our use case.
> In our case there will be some information (some document classes) that
> will be append only while others can be updated.
>
> Will you have a way to support mixed mode like that and what do you think
> the benefits of append-only will be in terms of speed/performance?
>
> Regards,
>  -Stefan Baxter
>
>
>
> On Tuesday, 21 January 2014 08:38:27 UTC, Andrey Lomakin wrote:
>
>> Hi Jun,
>> Both of  issues which you described are fixed in https://github.com/
>> orientechnologies/orientdb/tree/rid-set-sbtree branch (we do not support
>> remote storage yet) but as I can see you use embedded storage any way.
>> Could you use plocal storage for your tests.
>>
>> About memory consumption OrientDB uses heap and direct memory (it
>> consumes 4GB by default) if you would like to decrease amount of consumed
>> memory you can set storage.diskCache.bufferSize property (in megabytes).
>> Also about  blueprints-orient-graph-2.5.0-SNAPSHOT dependency, it is not
>> needed any more, blueprints implementation is embedded in graphdb so please
>> drop this dependency.
>>
>>
>> P.S. And finally about comparison to Neo4J insertion speed we have
>> proposal for append only cluster which should improve insertion speed.
>> P.S.2 looking forward for your feedback !
>>
>>
>>
>> On Fri, Jan 17, 2014 at 10:56 PM, Jun Xu <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I'm evaluating different graph database products and am new to OrientDB.
>>> One use case I'm testing now is loading data to graph database. The use
>>> case basically is building a graph with half million vertices and a few
>>> millions of edges. I'm using OrientDB 1.6.4 on a CentOS Linux box with 8GB
>>> of memory and the CentOS version is 5.10 and the JDK is 1.7.0_40. The
>>> blueprints version is blueprints-core-2.5.0-SNAPSHOT
>>> and blueprints-orient-graph-2.5.0-SNAPSHOT.
>>>
>>> I use OrientGraph to build the graph. During initialization, it creates
>>> an OrientGraph instance ("plocal" or "local" storage engine) and creates a
>>> few key indices using createKeyIndex on vertex nodes. The building process
>>> does index based lookups (OrientGraph.getVertices()) on vertices and based
>>> whether the vertices exist or not, it will create them and set properties,
>>> or create edges and set properties on edges. There are no global index
>>> based lookups on edges. Edges are always reached via vertices. I load the
>>> data in batches (each batch probably has a few hundreds operations like
>>> looking up a vertex, creating a vertex, getting all edges of a vertex,
>>> creating an edge and setting a property etc.) and commit transaction at the
>>> end of each batch. After processing around 300 batches, an exception of
>>> "Maximum lock count exceeded" was thrown. I tried both "local" and "plocal"
>>> storage engine and got the same exception. I searched this group and got to
>>> know that OrientDB used to have this bug in very old versions and I'm using
>>> the latest version (1.6.4).
>>>
>>> Since the exception was thrown in transaction commit, I changed to use
>>> the OrientGraphNoTx interface. Without transaction enabled, I did not get
>>> the "Maximum lock count exceeded" exception but I noticed that the process
>>> was really eager for memory. Giving JVM 4GB of max memory, the speed was OK
>>> although still slower than Neo4j for the same process. I did not let the
>>> process finish once I saw the memory usage growing to 3GB. I restarted the
>>> process by giving JVM only 1GB of maximum memory and after running the
>>> process for 2 and half hours, an OutOfMemoryError was thrown. While with
>>> Neo4j, the whole loading process was finished using 1GB of maximum memory
>>> with quite good performance.
>>>
>>> Another thing I noticed was that the database size on disk is much
>>> bigger than the database size using Neo4j. At half way of the loading
>>> process, the OrientDB DB directory is already at 4GB, while for Neo4j the
>>> DB directory size is only 1.6GB after the whole loading process is finished.
>>>
>>> I actually really like the way OrientDB is designed, the mix of document
>>> and graph features and the binary protocol on remote interfaces. I really
>>> appreciate if you can help me get around the hurdles mentioned above. I
>>> might have done something wrong or maybe there are some tuning can be done.
>>>
>>> Thanks.
>>> Jun
>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey Lomakin.
>>
>> Orient Technologies
>> the Company behind OrientDB
>>
>>   --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to