Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Phillip Henry Fri, 23 Sep 2016 08:52:10 -0700

Hi, Luca.

> How many GB?


The input file is 22gb of text.

> If the file is ordered ...

You are only sorting by the first account. The second account can be 
anywhere in the entire range. My understanding is that both vertices are 
updated when an edge is written. If this is true, will there not be 
potential contention when the "to" vertex is updated?

> OGraphBatchInsert ... keeps everything in RAM before flushing

I assume I will still have to write retry code in the event of a collision 
(see above)?

> You cna use support --at- orientdb.com ... 

Sent.

Regards,

Phill

On Friday, September 23, 2016 at 4:06:49 PM UTC+1, l.garulli wrote:
>
> On 23 September 2016 at 03:50, Phillip Henry <phill...@gmail.com 
> <javascript:>> wrote:
>
>> > How big is your file the sort cannot write?
>>
>> One bil-ee-on lines... :-P
>>
>
> How many GB?
>  
>
>> > ...This should help a lot. 
>>
>> The trouble is that the size of a block of contiguous accounts in the 
>> real data is not-uniform (even if it might be with my test data). 
>> Therefore, it is highly likely a contiguous block of account numbers will 
>> span 2 or more batches. This will lead to a lot of contention. In your 
>> example, if Account 2 spills over into the next batch, chances are I'll 
>> have to rollback that batch.
>>
>> Don't you also have a problem that if X, Y, Z and W in your example are 
>> account numbers in the next batch, you'll also get contention? Admittedly, 
>> randomization doesn't solve this problem either.
>>
>
> If the file is ordered, you could have X threads (where X is the number of 
> cores) that parse the file not sequentially. For example with 4 threads, 
> you could start the parsing in this way:
>
> Thread 1, starts from 0
> Thread 2, starts from length * 1/4
> Thread 3, starts from length * 2/4
> Thread 1, starts from length * 3/4
>  
> Of course the parsing should browse until the next LF+LR if it's a CSV. It 
> requires some lines of code, but you could avoid many conflicts.
>
>
>> > you can use the special Batch Importer: OGraphBatchInsert
>>
>> Would this not be subject to the same contention problems?
>> At what point is it flushed to disk? (Obviously, it can't live in heap 
>> forever).
>>
>
> It keeps everything in RAM before flushing. Up to a few hundreds of 
> millions of vertices/edges should be fine if you have a lot of heap, like 
> 58GB (and 4GB of DISKCACHE). It depends by the number of attributes you 
> have.
>  
>
>> > You should definitely using transactions with batch size of 100 items. 
>>
>> I thought I read somewhere else (can't find the link at the moment) that 
>> you said only use transactions when using the remote protocol?
>>
>
> This was true before v2.2. With v2.2 the management of the transaction is 
> parallel and very light. Transactions work well with graphs because every 
> addEdge() operation is 2 update and having a TX that works like a batch 
> really helps.
>  
>
>>
>> > Please use last 2.2.10. ... try to define 50GB of DISKCACHE and 14GB of 
>> Heap
>>
>> Will do on the next run.
>>
>> > If happens again, could you please send a thread dump?
>>
>> I have the full thread dump but it's on my work machine so can't post it 
>> in this forum (all access to Google Groups is banned by the bank so I am 
>> writing this on my personal computer). Happy to email them to you. Which 
>> email shall I use?
>>
>
> You cna use support --at- orientdb.com referring at this thread in the 
> subject.
>  
>
>>
>> Phill
>>
>
>
> Best Regards,
>
> Luca Garulli
> Founder & CEO
> OrientDB LTD <http://orientdb.com/>
>
> Want to share your opinion about OrientDB?
> Rate & review us at Gartner's Software Review 
> <https://www.gartner.com/reviews/survey/home>
>
>  
>
>> On Friday, September 23, 2016 at 7:41:29 AM UTC+1, l.garulli wrote:
>>
>>> On 23 September 2016 at 00:49, Phillip Henry <phill...@gmail.com> wrote:
>>>
>>>> Hi, Luca.
>>>>
>>>
>>> Hi Phillip.
>>>  
>>>
>>>> I have:
>>>>
>>>> 4. sorting is an overhead, albeit outside of Orient. Using the Unix 
>>>> sort command failed with "No space left on device". Oops. OK, so I ran my 
>>>> program to generate the data again, this time it is ordered by the first 
>>>> account number. Performance was much slower as there appeared to be a lot 
>>>> of contention for this account (ie, all writes were contending for this 
>>>> account, even if the other account had less contention). More randomized 
>>>> data was faster.
>>>>
>>>
>>> How big is your file the sort cannot write? Anyway, if you have the 
>>> accounts sorted, you should have transactions of about 100 items where the 
>>> bank account and edges are in the same transaction. This should help a lot. 
>>> Example:
>>>
>>> Account 1 -> Payment 1 -> Account X
>>> Account 1 -> Payment 2 -> Account Y
>>> Account 1 -> Payment 3 -> Account Z
>>> Account 2 -> Payment 1 -> Account X
>>> Account 2 -> Payment 1 -> Account W
>>>
>>> If the transaction batch is 5 (I suggest you to start with 100), all the 
>>> operations are executed in one transaction. In another thread has:
>>>
>>> Account 99 -> Payment 1 -> Account W
>>>
>>> It could go in conflict because the shared Account W.
>>>
>>> If you can export Account's IDs that are numbers and incremental, you 
>>> can use the special Batch Importer: OGraphBatchInsert. Example:
>>>
>>> OGraphBatchInsert batch = new OGraphBatchInsert("plocal:/temp/mydb", 
>>> "admin", "admin");
>>> batch.begin();
>>>
>>> batch.createEdge(0L, 1L, null); // CREATE EDGES BETWEEN VERTEX 0 and 1. IF 
>>> VERTICES
>>>
>>>                                 // DON'T EXISTS, ARE CREATED IMPLICITELY
>>> batch.createEdge(1L, 2L, null);
>>> batch.createEdge(2L, 0L, null);
>>>
>>>
>>> batch.createVertex(3L); // CREATE AN NON CONNECTED VERTEX
>>>
>>>
>>> Map<String, Object> vertexProps = new HashMap<String, Object>();
>>> vertexProps.put("foo", "foo");
>>> vertexProps.put("bar", 3);
>>> batch.setVertexProperties(0L, vertexProps); // SET PROPERTY FOR VERTEX 0
>>> batch.end();
>>>
>>> This is blazing fast, but uses Heap so run it with a lot of it.
>>>  
>>>
>>>>
>>>> 6. I've mutlithreaded my loader. The details are now:
>>>>
>>>> - using plocal
>>>> - using 30 threads
>>>> - not using transactions (OrientGraphFactory.getNoTx)
>>>>
>>>
>>> You should definitely using transactions with batch size of 100 items. 
>>> This speeds up things.
>>>  
>>>
>>>> - retrying forever upon write collisions.
>>>> - using Orient 2.2.7.
>>>>
>>>
>>> Please use last 2.2.10.
>>>  
>>>
>>>> - using -XX:MaxDirectMemorySize:258040m
>>>>
>>>
>>> This is not really important, it's just an upper bound for the JVM. 
>>> Please set it to 512GB so you can forget about it. The 2 most important 
>>> values are DISKCACHE and JVM heap. The sum must lower than the available 
>>> RAM in the server before you run OrientDB.
>>>
>>> If you have 64GB, try to define 50GB of DISKCACHE and 14GB of Heap.
>>>
>>> If you use the Batch Importer, you should use more Heap and less 
>>> DISKCACHE.
>>>  
>>>
>>>> The good news is I've achieved an initial write throughput of about 
>>>> 30k/second.
>>>>
>>>> The bad news is I've tried several runs and only been able to achieve 
>>>> 200mil < number of writes < 300mil.
>>>>
>>>> The first time I tried it, the loader deadlocked. Using jstat showed 
>>>> that the deadlock was between 3 threads at:
>>>> - 
>>>> OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.java:173)
>>>> - 
>>>> OPartitionedLockManager.acquireExclusiveLock(OPartitionedLockManager.java:210)
>>>> - 
>>>> OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.java:171)
>>>>
>>>
>>> If happens again, could you please send a thread dump?
>>>  
>>>
>>>> The second time it failed was due to a NullPointerException at 
>>>> OByteBufferPool.java:297. I've looked at the code and the only way I can 
>>>> see this happening is if OByteBufferPool.allocateBuffer throws an error 
>>>> (perhaps an OutOfMemoryError in java.nio.Bits.reserveMemory). This 
>>>> StackOverflow posting (
>>>> http://stackoverflow.com/questions/8462200/examples-of-forcing-freeing-of-native-memory-direct-bytebuffer-has-allocated-us)
>>>>  
>>>> seems to indicate that this can happen if the underlying 
>>>> DirectByteBuffer's 
>>>> Cleaner doesn't have its clean() method called. 
>>>>
>>>
>>> This is because the database was bigger than this setting: - using 
>>> -XX:MaxDirectMemorySize:258040m. Please set this at 512GB (see above).
>>>  
>>>
>>>> Alternatively, I followed the SO suggestion and lowered the heap space 
>>>> to a mere 1gb (it was 50gb) to make the GC more active. Unfortunately, 
>>>> after a good start, the job is still running some 15 hours later with a 
>>>> hugely reduced write throughput (~ 7k/s). Jstat shows 4292 full GCs taking 
>>>> a total time of 4597s - not great but not hugely awful either. At this 
>>>> rate, the remaining 700mil or so payments are going to take another 30 
>>>> hours.
>>>>
>>>
>>> See above the suggested settings.
>>>  
>>>
>>>> 7. Even with the highest throughput I have achieved, 30k writes per 
>>>> second, I'm looking at about 20 hours of loading. We've taken the same 
>>>> data 
>>>> and, after trial and error that was not without its own problems, put it 
>>>> into Neo4J in 37 minutes. This is a significant difference. It appears 
>>>> that 
>>>> they are approaching the problem differently to avoid contention on 
>>>> updating the vertices during an edge write.
>>>>
>>>
>>> With all this suggestion you should be able to have much better numbers. 
>>> If you can use the Batch Importer the number should be close to Neo4j.
>>>  
>>>
>>>>
>>>> Thoughts?
>>>>
>>>> Regards,
>>>>
>>>> Phillip
>>>>
>>>>
>>>
>>> Best Regards,
>>>
>>> Luca Garulli
>>> Founder & CEO
>>> OrientDB LTD <http://orientdb.com/>
>>>
>>> Want to share your opinion about OrientDB?
>>> Rate & review us at Gartner's Software Review 
>>> <https://www.gartner.com/reviews/survey/home>
>>>
>>>
>>>  
>>>
>>>>
>>>> On Thursday, September 15, 2016 at 10:06:44 PM UTC+1, l.garulli wrote:
>>>>>
>>>>> On 15 September 2016 at 09:54, Phillip Henry <phill...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> Hi, Luca.
>>>>>>
>>>>>
>>>>> Hi Phillip,
>>>>>
>>>>> 3. Yes, default configuration. Apart from adding an index for 
>>>>>> ACCOUNTS, I did nothing further.
>>>>>>
>>>>>
>>>>> Ok, so you have writeQuorum="majority" that means 2 sycnhronous writes 
>>>>> and 1 asynchronous per transaction.
>>>>>  
>>>>>
>>>>>> 4. Good question. With real data, we expect it to be as you suggest: 
>>>>>> some nodes with the majority of the payments (eg, supermarkets). 
>>>>>> However, 
>>>>>> for the test data, payments were assigned randomly and, therefore, 
>>>>>> should 
>>>>>> be uniformly distributed.
>>>>>>
>>>>>
>>>>> What's your average in terms of number of edges? <10, <50, <200, <1000?
>>>>>  
>>>>>
>>>>>> 2. Yes, I tried plocal minutes after posting (d'oh!). I saw a good 
>>>>>> improvement. It started about 3 times faster and got faster still (about 
>>>>>> 10 
>>>>>> times faster) by the time I checked this morning on a job running 
>>>>>> overnight. However, even though it is now running at about 7k 
>>>>>> transactions 
>>>>>> per second, a billion edges is still going to take about 40 hours. So, I 
>>>>>> ask myself: is there anyway I can make it faster still?
>>>>>>
>>>>>
>>>>> Here it's missing the usage of AUTO-SHARDING INDEX. Example:
>>>>>
>>>>> accountClass.createIndex("Account.number", 
>>>>> OClass.INDEX_TYPE.UNIQUE.toString(), (OProgressListener) null, 
>>>>> (ODocument) null,
>>>>>     "AUTOSHARDING", new String[] { "number" });
>>>>>
>>>>> In this way you should go more in parallel, because the index is 
>>>>> distributed across all the shards (clusters) of Account class. you should 
>>>>> have 32 of them by default because you have 32 cores. 
>>>>>
>>>>> Please let me know if by sorting the from_accounts and with this 
>>>>> change if it's much faster.
>>>>>
>>>>> This is the best you can have out of the box. To push numbers up it's 
>>>>> slightly more complicated: you should be sure that transactions go in 
>>>>> parallel and they aren't serialized. This is possible by playing with 
>>>>> internal OrientDB settings (mainly the distributed workerThreads), by 
>>>>> having many clusters per class (You could try with 128 first and see how 
>>>>> it's going).
>>>>>  
>>>>>
>>>>>> I assume when I start the servers up in distributed mode once more, 
>>>>>> the data will then be distributed across all nodes in the cluster?
>>>>>>
>>>>>
>>>>> That's right.
>>>>>  
>>>>>
>>>>>> 3. I'll return to concurrent, remote inserts when this job has 
>>>>>> finished. Hopefully, a smaller batch size will mean there is no 
>>>>>> degradation 
>>>>>> in performance either... FYI: with a somewhat unscientific approach, I 
>>>>>> was 
>>>>>> polling the server JVM with JStack and saw only a single thread doing 
>>>>>> all 
>>>>>> the work and it *seemed* to spend a lot of its time in ODirtyManager on 
>>>>>> collection manipulation.
>>>>>>
>>>>>
>>>>> I think it's because you didn't use the AUTO-SHARDING index. 
>>>>> Furthermore running distributed, unfortunately, means the tree ridbag is 
>>>>> not available (we will support it in the future), so every change to the 
>>>>> edges takes a lot of CPU to demarshall and marshall the entire edge list 
>>>>> everytime you update a vertex. That's why my recommendation about sorting 
>>>>> the vertices.
>>>>>  
>>>>>
>>>>>> I totally appreciate that performance tuning is an empirical science, 
>>>>>> but do you have any opinions as to which would probably be faster: 
>>>>>> single-threaded plocal or multithreaded remote? 
>>>>>>
>>>>>
>>>>> With v2.2 yo can go in parallel, by using the tips above. For sure the 
>>>>> replication has a cost. I'm sure you can go much faster with just one 
>>>>> node 
>>>>> and then start the other 2 nodes to have the database replicated 
>>>>> automatically. At least for the first massive insertion.
>>>>>  
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Phillip
>>>>>>
>>>>>
>>>>> Luca
>>>>>
>>>>>  
>>>>>
>>>>>>
>>>>>> On Wednesday, September 14, 2016 at 3:48:56 PM UTC+1, Phillip Henry 
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi, guys.
>>>>>>>
>>>>>>> I'm conducting a proof-of-concept for a large bank (Luca, we had a 
>>>>>>> 'phone conf on August 5...) and I'm trying to bulk insert a humongous 
>>>>>>> amount of data: 1 million vertices and 1 billion edges.
>>>>>>>
>>>>>>> Firstly, I'm impressed about how easy it was to configure a cluster. 
>>>>>>> However, the performance of batch inserting is bad (and seems to get 
>>>>>>> considerably worse as I add more data). It starts at about 2k 
>>>>>>> vertices-and-edges per second and deteriorates to about 500/second 
>>>>>>> after 
>>>>>>> only about 3 million edges have been added. This also takes ~ 30 
>>>>>>> minutes. 
>>>>>>> Needless to say that 1 billion payments (edges) will take over a week 
>>>>>>> at 
>>>>>>> this rate. 
>>>>>>>
>>>>>>> This is a show-stopper for us.
>>>>>>>
>>>>>>> My data model is simply payments between accounts and I store it in 
>>>>>>> one large file. It's just 3 fields and looks like:
>>>>>>>
>>>>>>> FROM_ACCOUNT TO_ACCOUNT AMOUNT
>>>>>>>
>>>>>>> In the test data I generated, I had 1 million accounts and 1 billion 
>>>>>>> payments randomly distributed between pairs of accounts.
>>>>>>>
>>>>>>> I have 2 classes in OrientDB: ACCOUNTS (extending V) and PAYMENT 
>>>>>>> (extending E). There is a UNIQUE_HASH_INDEX on ACCOUNTS for the account 
>>>>>>> number (a string).
>>>>>>>
>>>>>>> We're using OrientDB 2.2.7.
>>>>>>>
>>>>>>> My batch size is 5k and I am using the "remote" protocol to connect 
>>>>>>> to our cluster.
>>>>>>>
>>>>>>> I'm using JDK 8 and my 3 boxes are beefy machines (32 cores each) 
>>>>>>> but without SSDs. I wrote the importing code myself but did nothing 
>>>>>>> 'clever' (I think) and used the Graph API. This client code has been 
>>>>>>> given 
>>>>>>> lots of memory and using jstat I can see it is not excessively GCing.
>>>>>>>
>>>>>>> So, my questions are:
>>>>>>>
>>>>>>> 1. what kind of performance can I realistically expect and can I 
>>>>>>> improve what I have at the moment?
>>>>>>>
>>>>>>> 2. what kind of degradation should I expect as the graph grows?
>>>>>>>
>>>>>>> Thanks, guys.
>>>>>>>
>>>>>>> Phillip
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "OrientDB" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to orient-databa...@googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to orient-databa...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-databa...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Reply via email to