Thanks Lyndon. For the benchmarking framework itself, can it take in any dataset in csv file format, or only the specific dataset you've generated?
Also I think if we want to standardize on a set of sample data for users, the best way may be to host the data on ASF, and It does look like ASF has a server to host files: https://nightlies.apache.org/. Thoughts? Cheers, Yang On 2024/11/19 17:46:18 Lyndon Bauto wrote: > I was planning on contributing the entire framework which is basically a > framework that's set up in a way where you can add additional benchmarks to > it similar to adding a test in junit, so if people want to make new > benchmarks on different things, it's easy. Not sure on the exact place it'd > like, gremlin-tools seems reasonable though. > > The dataset we used was basically an identity graph dataset (csv format) > that is generated from 350 GB, 3.5 TB, and 15TB. Aerospike will host this > data in a public gcp bucket and others are welcome to use it. > > We will likely generate some other datasets and add to this over time, this > is not something that TinkerPop would be on the hook for hosting or > providing in any way. > > -Lyndon > > On Tue, Nov 19, 2024 at 8:33 AM Yang Xia <xia...@apache.org> wrote: > > > Hi Lyndon, > > > > I also think this is something that can benefit users. I just have some > > quick questions. > > > > Could you clarify what you plan to contribute into TinkerPop? Is it the > > benchmarking framework, the dataset, or both? For the benchmarking > > framework, are you looking to PR something into the gremlin-tools module? > > For the dataset, what type of data does it have, how is it generated? Since > > there are quite a few benchmarking dataset that exists out there already, I > > feel like we've usually kept datasets external. > > > > Cheers, > > > > Yang > > > > On 2024/11/15 20:25:03 Lyndon Bauto wrote: > > > Reviving this thread. > > > > > > I think I have exposed a bottleneck in the Java driver. Not sure what it > > > is, but if I scale the client machine up to 128 cores and 2:1 thread:core > > > ratio, I get no additional performance over a say, 16 core machine. > > However > > > if I create additional JVM's running the benchmark I get additional > > > performance. It's still unclear whether this is the driver or maybe the > > JVM > > > running the driver. > > > > > > Anyway I would like to move forward with getting this into TinkerPop. The > > > plan I have is to make the datasets we have generated for it so far > > public, > > > then anyway can load the dataset into any graph provider's graph and run > > > the benchmark. Additionally we will look into other datasets separately. > > > > > > let me know if anyone has any concerns > > > -Lyndon > > > > > > On Thu, Jul 11, 2024 at 8:54 AM Ken Hu <kenhu...@gmail.com> wrote: > > > > > > > I think this would be very useful for the 3.x line that uses > > WebSockets. > > > > There's difficulty in recommending what the best connection settings > > are to > > > > increase performance for different workloads and an automated tool to > > > > discover that would be helpful to users. On a side note, a goal during > > the > > > > transition to HTTP should be to make the connection settings simpler so > > > > that it is easier to figure out what the settings should be for a > > specific > > > > workload. > > > > > > > > I feel like there are definitely some users that will benefit from a > > > > benchmarking tool like this. > > > > > > > > On Wed, Jul 10, 2024 at 12:42 PM Lyndon Bauto > > <lba...@aerospike.com.invalid > > > > > > > > > wrote: > > > > > > > > > Right now the dataset and benchmarking setup is really simple. > > > > > > > > > > It does some mergeV's, edge insertion, get vertex by id > > > > (g.V(<id>).next()), > > > > > and then some g.V(<id>).out().out().out().out(). > > > > > > > > > > The idea being to get results for queries that require a decent bit > > of > > > > > processing, as well as quick lookup and return queries that will > > allow us > > > > > to test the driver when it's under high throughput load that is > > highly > > > > > concurrent. We could also add a query that returns a lot of data > > without > > > > a > > > > > lot of processing, so we could test the driver under a scenario > > where a > > > > lot > > > > > of data is coming back. > > > > > > > > > > This would help users identify what would be most beneficial for > > their > > > > use > > > > > case, for example, maybe few connections and many in process per > > > > connection > > > > > gets better use of resources when the data returned is minimal but > > the > > > > > number of queries running is very high, meanwhile more connections > > with > > > > > less in process per connection might achieve better results when > > queries > > > > > are returning more data. > > > > > > > > > > Down the road adding things like identity graph use cases, fraud > > > > detection > > > > > use cases, and others with datasets included and queries to > > benchmark in > > > > > there would be a great way for providers to opt into providing > > benchmarks > > > > > that are relevant to their target customers but that is a later > > thing. > > > > > > > > > > - Lyndon > > > > > > > > > > On Tue, Jul 9, 2024 at 4:14 PM Ken Hu <kenhu...@gmail.com> wrote: > > > > > > > > > > > Hey Lyndon, > > > > > > > > > > > > This is a very interesting idea. You mentioned throughput testing > > but > > > > how > > > > > > does this compare to other graph testing that use specific > > generated > > > > > > datasets and specific queries? Asked another way, what kind of > > queries > > > > > are > > > > > > you using to test in this system? > > > > > > > > > > > > Regards, > > > > > > Ken > > > > > > > > > > > > On Tue, Jul 9, 2024 at 2:00 PM Lyndon Bauto > > > > <lba...@aerospike.com.invalid > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi devs, > > > > > > > > > > > > > > I've been working on a benchmarking framework for tinkerpop, > > > > > specifically > > > > > > > the Java driver. > > > > > > > > > > > > > > The idea is to have a benchmarking framework that a TinkerPop > > user > > > > can > > > > > > > target their instance of gremlin-server with (can be any > > provider) > > > > and > > > > > > what > > > > > > > this will allow them to do is fix some of their configs of the > > driver > > > > > > while > > > > > > > having others as variables. The framework will then run through a > > > > bunch > > > > > > of > > > > > > > different settings, recording latency and throughput. > > > > > > > > > > > > > > The output of the benchmarking framework would be guidance for > > the > > > > user > > > > > > of > > > > > > > the Java driver for optimal configuration for both latency and > > > > > > throughput, > > > > > > > that they can then use to optimize their workload outside the > > > > > framework. > > > > > > > > > > > > > > A provider could also use this to manually adjust > > > > > > > gremlinPool/threadPoolWorkers/etc and run the framework under > > > > different > > > > > > > settings to optimize throughput and latency there as well. > > > > > > > > > > > > > > The benchmark is built on JMH and is build into a docker > > container so > > > > > it > > > > > > is > > > > > > > very easy to use. The configs are passed at runtime, so a user > > would > > > > > just > > > > > > > call a docker build then docker run script, with the configs > > setup in > > > > > the > > > > > > > docker config. > > > > > > > > > > > > > > We could also add other benchmarks at any scale to the framework > > that > > > > > > allow > > > > > > > benchmark publishing from providers who wish to participate. > > > > > > > > > > > > > > Anyone have any thoughts on this? > > > > > > > > > > > > > > Cheers, > > > > > > > Lyndon > > > > > > > -- > > > > > > > > > > > > > > *Lyndon Bauto* > > > > > > > *Senior Software Engineer* > > > > > > > *Aerospike, Inc.* > > > > > > > www.aerospike.com > > > > > > > lba...@aerospike.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > *Lyndon Bauto* > > > > > *Senior Software Engineer* > > > > > *Aerospike, Inc.* > > > > > www.aerospike.com > > > > > lba...@aerospike.com > > > > > > > > > > > > > > > > > > -- > > > > > > *Lyndon Bauto* > > > *Senior Software Engineer* > > > *Aerospike, Inc.* > > > www.aerospike.com > > > lba...@aerospike.com > > > > > > > > -- > > *Lyndon Bauto* > *Senior Software Engineer* > *Aerospike, Inc.* > www.aerospike.com > lba...@aerospike.com >