Right now the dataset and benchmarking setup is really simple. It does some mergeV's, edge insertion, get vertex by id (g.V(<id>).next()), and then some g.V(<id>).out().out().out().out().
The idea being to get results for queries that require a decent bit of processing, as well as quick lookup and return queries that will allow us to test the driver when it's under high throughput load that is highly concurrent. We could also add a query that returns a lot of data without a lot of processing, so we could test the driver under a scenario where a lot of data is coming back. This would help users identify what would be most beneficial for their use case, for example, maybe few connections and many in process per connection gets better use of resources when the data returned is minimal but the number of queries running is very high, meanwhile more connections with less in process per connection might achieve better results when queries are returning more data. Down the road adding things like identity graph use cases, fraud detection use cases, and others with datasets included and queries to benchmark in there would be a great way for providers to opt into providing benchmarks that are relevant to their target customers but that is a later thing. - Lyndon On Tue, Jul 9, 2024 at 4:14 PM Ken Hu <kenhu...@gmail.com> wrote: > Hey Lyndon, > > This is a very interesting idea. You mentioned throughput testing but how > does this compare to other graph testing that use specific generated > datasets and specific queries? Asked another way, what kind of queries are > you using to test in this system? > > Regards, > Ken > > On Tue, Jul 9, 2024 at 2:00 PM Lyndon Bauto <lba...@aerospike.com.invalid> > wrote: > > > Hi devs, > > > > I've been working on a benchmarking framework for tinkerpop, specifically > > the Java driver. > > > > The idea is to have a benchmarking framework that a TinkerPop user can > > target their instance of gremlin-server with (can be any provider) and > what > > this will allow them to do is fix some of their configs of the driver > while > > having others as variables. The framework will then run through a bunch > of > > different settings, recording latency and throughput. > > > > The output of the benchmarking framework would be guidance for the user > of > > the Java driver for optimal configuration for both latency and > throughput, > > that they can then use to optimize their workload outside the framework. > > > > A provider could also use this to manually adjust > > gremlinPool/threadPoolWorkers/etc and run the framework under different > > settings to optimize throughput and latency there as well. > > > > The benchmark is built on JMH and is build into a docker container so it > is > > very easy to use. The configs are passed at runtime, so a user would just > > call a docker build then docker run script, with the configs setup in the > > docker config. > > > > We could also add other benchmarks at any scale to the framework that > allow > > benchmark publishing from providers who wish to participate. > > > > Anyone have any thoughts on this? > > > > Cheers, > > Lyndon > > -- > > > > *Lyndon Bauto* > > *Senior Software Engineer* > > *Aerospike, Inc.* > > www.aerospike.com > > lba...@aerospike.com > > > -- *Lyndon Bauto* *Senior Software Engineer* *Aerospike, Inc.* www.aerospike.com lba...@aerospike.com