Re: [arangodb-google] LDBC Social Network Benchmark Implementation

Jonathan Ellithorpe Wed, 10 Jul 2019 13:15:17 -0700

Hi Jan,

Yup, completely understand. I'll send you the details you asked about over 
e-mail later today.


Best,
Jonathan

On Wednesday, July 10, 2019 at 9:55:34 AM UTC-7, jan.stuecke wrote:
>
> Hey Jonathan,
>
> ok, sounds very interesting! Super cool pre-work. That helps a lot.
>
> Would be happy to collaborate with you on this but have to check back with 
> our graph specialists on our side first. I don’t want to promise anything 
> and then our guys are fully booked with customer projects & prod dev.
>
> Happy to keep this thread alive and post potential updates here for 
> everybody but for the details, we could switch to email. You can reach me 
> via [email protected] <javascript:>. Would be great if you could send 
> me the analytical queries and the amount of documents per collection like 
> persons, tabs, etc in your 1 TB dataset. Then I can discuss with our 
> seniors over here.
>
> Best, Jan
>
> On Wed 10. Jul 2019 at 07:44, Jonathan Ellithorpe <[email protected] 
> <javascript:>> wrote:
>
>> Hi Jan,
>> [image: ldbc_snb_schema.png]
>>
>> Thanks for that explanation, that does help, I'm glad that got resolved 
>> (haven't seen that thread updated yet with the resolution).
>>
>>
>> The LDBC Social Network Benchmark is more property graph focused 
>> actually. I've included an image of the graph schema to illustrate.
>>
>>
>> While the schema is relatively straightforward, the benchmark is fairly 
>> comprehensive and challenging, including a total of 29 queries, 14 complex 
>> "analytical" type read-only queries, 7 simple read-only queries, and 8 
>> update queries that add people and posts and likes and so on to the graph.
>>
>>
>> I have a working implementation for Neo4j (as well as my own graph 
>> database I've been working on as a research project) in the following repo:
>>
>>
>> https://github.com/PlatformLab/ldbc-snb-impls
>>
>>
>> I just added a skeleton for an ArangoDB implementation. Since I'm not 
>> familiar with AQL (just started playing around with it today), I estimate 
>> it would take me considerable time to complete a full implementation. I may 
>> be able to flesh out the simpler short read queries and updates in a couple 
>> of days, but the 14 "analytical" style complex queries are where things 
>> get... well... complicated. The hard part is making sure I'm doing the 
>> target database justice and making sure I've written the query in the most 
>> performant manner possible. Even with the gracious help of the (amazing) 
>> developers at Apache TinkerPop (many thanks to them for their help), 
>> getting a Gremlin implementation just to pass validation was about a man 
>> month of work (includes learning Gremlin), and then another week or two on 
>> top of that to work out inefficiencies in the query implementations.
>>
>>
>> Would be happy to collaborate on this, as I've already been working with 
>> this benchmark for quite a while and have datasets (up to 1TB in size) 
>> available for use, along with various tools and validation data for 
>> testing. What I do not have, however, is ArangoDB / AQL expertise to 
>> produce the highest performance complex query implementations possible for 
>> ArangoDB (the simple read and update queries are simple enough I believe I 
>> can work those out fairly easily).
>>
>>
>> Cheers,
>>
>> Jonathan
>>
>>
>>
>>
>> On Tuesday, July 9, 2019 at 9:06:23 PM UTC-7, jan.stuecke wrote:
>>
>>> Hi Jonathan,
>>>
>>> this is Jan from ArangoDB.
>>>
>>> Thanks for the hint with the LDBC Benchmark. We will have a look if this 
>>> is a suitable setup for ArangoDB. Quite often these benchmarks are focused 
>>> on RDF stores but the graph part of ArangoDBs multi model offering is 
>>> rather following a property graph model.
>>>
>>> I forwarded the reported bulk load question to our Java specialist. Hope 
>>> he will find some time to assist here.
>>>
>>> Please note, that the problem with the “very simple query” wasn’t 
>>> necessarily on ArangoDB side and was solved by remodeling the data. The 
>>> user was storing huge binaries in ArangoDB which is possible but its 
>>> recommended to store it in a way that allows fast queries on the meta data 
>>> and only access the binary data if necessary. E.g if you store pictures, 
>>> pdfs or similar blobs, we recommend to store the meta data in collection A 
>>> and the actual blob in collection B if you want to store both in Arango. 
>>> Because if you store everything in one big JSON document, a query against 
>>> it has to access the whole document during runtime -> a lot of unneeded 
>>> processing -> query runtime increases. 
>>>
>>> The recommended way fro mour side for best performance in these cases is 
>>> to store meta data in ArangoDB and use a dedicated filesystem for your 
>>> binary data.
>>>
>>> Hope that helped.
>>>
>>> Best, Jan
>>>
>>> On Tue 9. Jul 2019 at 17:06, Jonathan Ellithorpe <[email protected]> 
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>> Has anyone worked on an implementation of the LDBC Social Network 
>>>> Benchmark for ArangoDB?
>>>>
>>>> I see some folks here evidently struggling with ArangoDB performance on 
>>>> even very simple queries (e.g. 
>>>> https://groups.google.com/forum/#!topic/arangodb/sIOQ1xzJSpc), as well 
>>>> as how to efficiently bulk load graph data (e.g. 
>>>> https://groups.google.com/forum/#!topic/arangodb/4eI3fvUzDYg). 
>>>>
>>>> An implementation of the above mentioned benchmark should serve nicely 
>>>> to show how to performantly use ArangoDB and AQL, including the bulk 
>>>> loading of graph data, besides showing ArangoDB's performance capabilities.
>>>>
>>>> Jonathan
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "ArangoDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/arangodb/3fa4003d-90c6-4aa9-9e40-d833155c14d0%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/arangodb/3fa4003d-90c6-4aa9-9e40-d833155c14d0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>>>
>>> *Jan Stücke*
>>> Head of Communications
>>>
>>> [email protected] | +49 (0)221 / 2722999-60
>>>
>>>
>>> *Help us grow the multi-model vision with your review on Gartner Peer 
>>> Reviews 
>>> <https://www.gartner.com/reviews/market/operational-dbms/vendor/arangodb>
>>> .
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "ArangoDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/arangodb/14150ba1-330c-416a-b264-2fb374f37f44%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/arangodb/14150ba1-330c-416a-b264-2fb374f37f44%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> -- 
>
> *Jan Stücke*
> Head of Communications
>
> [email protected] <javascript:> | +49 (0)221 / 2722999-60
>
>
> *Help us grow the multi-model vision with your review on Gartner Peer 
> Reviews 
> <https://www.gartner.com/reviews/market/operational-dbms/vendor/arangodb>.
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/arangodb/762c7da4-cbf7-4d29-a32d-c4a3b091a152%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [arangodb-google] LDBC Social Network Benchmark Implementation

Reply via email to