Hi Jan, Yup, completely understand. I'll send you the details you asked about over e-mail later today.
Best, Jonathan On Wednesday, July 10, 2019 at 9:55:34 AM UTC-7, jan.stuecke wrote: > > Hey Jonathan, > > ok, sounds very interesting! Super cool pre-work. That helps a lot. > > Would be happy to collaborate with you on this but have to check back with > our graph specialists on our side first. I don’t want to promise anything > and then our guys are fully booked with customer projects & prod dev. > > Happy to keep this thread alive and post potential updates here for > everybody but for the details, we could switch to email. You can reach me > via [email protected] <javascript:>. Would be great if you could send > me the analytical queries and the amount of documents per collection like > persons, tabs, etc in your 1 TB dataset. Then I can discuss with our > seniors over here. > > Best, Jan > > On Wed 10. Jul 2019 at 07:44, Jonathan Ellithorpe <[email protected] > <javascript:>> wrote: > >> Hi Jan, >> [image: ldbc_snb_schema.png] >> >> Thanks for that explanation, that does help, I'm glad that got resolved >> (haven't seen that thread updated yet with the resolution). >> >> >> The LDBC Social Network Benchmark is more property graph focused >> actually. I've included an image of the graph schema to illustrate. >> >> >> While the schema is relatively straightforward, the benchmark is fairly >> comprehensive and challenging, including a total of 29 queries, 14 complex >> "analytical" type read-only queries, 7 simple read-only queries, and 8 >> update queries that add people and posts and likes and so on to the graph. >> >> >> I have a working implementation for Neo4j (as well as my own graph >> database I've been working on as a research project) in the following repo: >> >> >> https://github.com/PlatformLab/ldbc-snb-impls >> >> >> I just added a skeleton for an ArangoDB implementation. Since I'm not >> familiar with AQL (just started playing around with it today), I estimate >> it would take me considerable time to complete a full implementation. I may >> be able to flesh out the simpler short read queries and updates in a couple >> of days, but the 14 "analytical" style complex queries are where things >> get... well... complicated. The hard part is making sure I'm doing the >> target database justice and making sure I've written the query in the most >> performant manner possible. Even with the gracious help of the (amazing) >> developers at Apache TinkerPop (many thanks to them for their help), >> getting a Gremlin implementation just to pass validation was about a man >> month of work (includes learning Gremlin), and then another week or two on >> top of that to work out inefficiencies in the query implementations. >> >> >> Would be happy to collaborate on this, as I've already been working with >> this benchmark for quite a while and have datasets (up to 1TB in size) >> available for use, along with various tools and validation data for >> testing. What I do not have, however, is ArangoDB / AQL expertise to >> produce the highest performance complex query implementations possible for >> ArangoDB (the simple read and update queries are simple enough I believe I >> can work those out fairly easily). >> >> >> Cheers, >> >> Jonathan >> >> >> >> >> On Tuesday, July 9, 2019 at 9:06:23 PM UTC-7, jan.stuecke wrote: >> >>> Hi Jonathan, >>> >>> this is Jan from ArangoDB. >>> >>> Thanks for the hint with the LDBC Benchmark. We will have a look if this >>> is a suitable setup for ArangoDB. Quite often these benchmarks are focused >>> on RDF stores but the graph part of ArangoDBs multi model offering is >>> rather following a property graph model. >>> >>> I forwarded the reported bulk load question to our Java specialist. Hope >>> he will find some time to assist here. >>> >>> Please note, that the problem with the “very simple query” wasn’t >>> necessarily on ArangoDB side and was solved by remodeling the data. The >>> user was storing huge binaries in ArangoDB which is possible but its >>> recommended to store it in a way that allows fast queries on the meta data >>> and only access the binary data if necessary. E.g if you store pictures, >>> pdfs or similar blobs, we recommend to store the meta data in collection A >>> and the actual blob in collection B if you want to store both in Arango. >>> Because if you store everything in one big JSON document, a query against >>> it has to access the whole document during runtime -> a lot of unneeded >>> processing -> query runtime increases. >>> >>> The recommended way fro mour side for best performance in these cases is >>> to store meta data in ArangoDB and use a dedicated filesystem for your >>> binary data. >>> >>> Hope that helped. >>> >>> Best, Jan >>> >>> On Tue 9. Jul 2019 at 17:06, Jonathan Ellithorpe <[email protected]> >>> wrote: >>> >>>> Hello All, >>>> >>>> Has anyone worked on an implementation of the LDBC Social Network >>>> Benchmark for ArangoDB? >>>> >>>> I see some folks here evidently struggling with ArangoDB performance on >>>> even very simple queries (e.g. >>>> https://groups.google.com/forum/#!topic/arangodb/sIOQ1xzJSpc), as well >>>> as how to efficiently bulk load graph data (e.g. >>>> https://groups.google.com/forum/#!topic/arangodb/4eI3fvUzDYg). >>>> >>>> An implementation of the above mentioned benchmark should serve nicely >>>> to show how to performantly use ArangoDB and AQL, including the bulk >>>> loading of graph data, besides showing ArangoDB's performance capabilities. >>>> >>>> Jonathan >>>> >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "ArangoDB" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/arangodb/3fa4003d-90c6-4aa9-9e40-d833155c14d0%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/arangodb/3fa4003d-90c6-4aa9-9e40-d833155c14d0%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> -- >>> >>> *Jan Stücke* >>> Head of Communications >>> >>> [email protected] | +49 (0)221 / 2722999-60 >>> >>> >>> *Help us grow the multi-model vision with your review on Gartner Peer >>> Reviews >>> <https://www.gartner.com/reviews/market/operational-dbms/vendor/arangodb> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "ArangoDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/arangodb/14150ba1-330c-416a-b264-2fb374f37f44%40googlegroups.com >> >> <https://groups.google.com/d/msgid/arangodb/14150ba1-330c-416a-b264-2fb374f37f44%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > > *Jan Stücke* > Head of Communications > > [email protected] <javascript:> | +49 (0)221 / 2722999-60 > > > *Help us grow the multi-model vision with your review on Gartner Peer > Reviews > <https://www.gartner.com/reviews/market/operational-dbms/vendor/arangodb>. > -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/arangodb/762c7da4-cbf7-4d29-a32d-c4a3b091a152%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
