Reviving this thread.

I think I have exposed a bottleneck in the Java driver. Not sure what it
is, but if I scale the client machine up to 128 cores and 2:1 thread:core
ratio, I get no additional performance over a say, 16 core machine. However
if I create additional JVM's running the benchmark I get additional
performance. It's still unclear whether this is the driver or maybe the JVM
running the driver.

Anyway I would like to move forward with getting this into TinkerPop. The
plan I have is to make the datasets we have generated for it so far public,
then anyway can load the dataset into any graph provider's graph and run
the benchmark. Additionally we will look into other datasets separately.

let me know if anyone has any concerns
-Lyndon

On Thu, Jul 11, 2024 at 8:54 AM Ken Hu <kenhu...@gmail.com> wrote:

> I think this would be very useful for the 3.x line that uses WebSockets.
> There's difficulty in recommending what the best connection settings are to
> increase performance for different workloads and an automated tool to
> discover that would be helpful to users. On a side note, a goal during the
> transition to HTTP should be to make the connection settings simpler so
> that it is easier to figure out what the settings should be for a specific
> workload.
>
> I feel like there are definitely some users that will benefit from a
> benchmarking tool like this.
>
> On Wed, Jul 10, 2024 at 12:42 PM Lyndon Bauto <lba...@aerospike.com.invalid
> >
> wrote:
>
> > Right now the dataset and benchmarking setup is really simple.
> >
> > It does some mergeV's, edge insertion, get vertex by id
> (g.V(<id>).next()),
> > and then some g.V(<id>).out().out().out().out().
> >
> > The idea being to get results for queries that require a decent bit of
> > processing, as well as quick lookup and return queries that will allow us
> > to test the driver when it's under high throughput load that is highly
> > concurrent. We could also add a query that returns a lot of data without
> a
> > lot of processing, so we could test the driver under a scenario where a
> lot
> > of data is coming back.
> >
> > This would help users identify what would be most beneficial for their
> use
> > case, for example, maybe few connections and many in process per
> connection
> > gets better use of resources when the data returned is minimal but the
> > number of queries running is very high, meanwhile more connections with
> > less in process per connection might achieve better results when queries
> > are returning more data.
> >
> > Down the road adding things like identity graph use cases, fraud
> detection
> > use cases, and others with datasets included and queries to benchmark in
> > there would be a great way for providers to opt into providing benchmarks
> > that are relevant to their target customers but that is a later thing.
> >
> > - Lyndon
> >
> > On Tue, Jul 9, 2024 at 4:14 PM Ken Hu <kenhu...@gmail.com> wrote:
> >
> > > Hey Lyndon,
> > >
> > > This is a very interesting idea. You mentioned throughput testing but
> how
> > > does this compare to other graph testing that use specific generated
> > > datasets and specific queries? Asked another way, what kind of queries
> > are
> > > you using to test in this system?
> > >
> > > Regards,
> > > Ken
> > >
> > > On Tue, Jul 9, 2024 at 2:00 PM Lyndon Bauto
> <lba...@aerospike.com.invalid
> > >
> > > wrote:
> > >
> > > > Hi devs,
> > > >
> > > > I've been working on a benchmarking framework for tinkerpop,
> > specifically
> > > > the Java driver.
> > > >
> > > > The idea is to have a benchmarking framework that a TinkerPop user
> can
> > > > target their instance of gremlin-server with (can be any provider)
> and
> > > what
> > > > this will allow them to do is fix some of their configs of the driver
> > > while
> > > > having others as variables. The framework will then run through a
> bunch
> > > of
> > > > different settings, recording latency and throughput.
> > > >
> > > > The output of the benchmarking framework would be guidance for the
> user
> > > of
> > > > the Java driver for optimal configuration for both latency and
> > > throughput,
> > > > that they can then use to optimize their workload outside the
> > framework.
> > > >
> > > > A provider could also use this to manually adjust
> > > > gremlinPool/threadPoolWorkers/etc and run the framework under
> different
> > > > settings to optimize throughput and latency there as well.
> > > >
> > > > The benchmark is built on JMH and is build into a docker container so
> > it
> > > is
> > > > very easy to use. The configs are passed at runtime, so a user would
> > just
> > > > call a docker build then docker run script, with the configs setup in
> > the
> > > > docker config.
> > > >
> > > > We could also add other benchmarks at any scale to the framework that
> > > allow
> > > > benchmark publishing from providers who wish to participate.
> > > >
> > > > Anyone have any thoughts on this?
> > > >
> > > > Cheers,
> > > > Lyndon
> > > > --
> > > >
> > > > *Lyndon Bauto*
> > > > *Senior Software Engineer*
> > > > *Aerospike, Inc.*
> > > > www.aerospike.com
> > > > lba...@aerospike.com
> > > >
> > >
> >
> >
> > --
> >
> > *Lyndon Bauto*
> > *Senior Software Engineer*
> > *Aerospike, Inc.*
> > www.aerospike.com
> > lba...@aerospike.com
> >
>


-- 

*Lyndon Bauto*
*Senior Software Engineer*
*Aerospike, Inc.*
www.aerospike.com
lba...@aerospike.com

Reply via email to