Hi guys, Don't get me wrong. This is absolutely not another post about benchmark of Elasticsearch. First, I am pretty new to ES. Please be patient if I ask dumb questions. I am doing a test for academic use only that proving ES's distributed characteristic is an improvement over Lucene, which is the base of ES. I want to test that with more than 1 node, the time we get from a search query is shorter or 'faster'. It is clear that with 2 nodes ( 2 hard disks ) we could get double bandwidth in theory ( each normal disk peak at ~ 50MB/s < 128MB = 1Gb of Ethernet so Ethernet is not a bottle neck).
I have 2 physical nodes ( normal laptop ) connected directly via 1Gb Ethernet port, no router in between. My data is 20GB ( + 20 GB replica) of 3 million records like this : http://pastebin.com/FDhfy6C3 ( the source of data I get is http://www.mockaroo.com/67e33320 ) My strategy is to write as many as possible search queries and at the same time clear the cache. Something like curl -XPOST "http://192.168.57.103:9200/myjson/_cache/clear" curl -XPOST "http://192.168.57.103:9200/myjson/_flush?force=true" curl -XGET "http://192.168.57.103:9200/myjson/myjson/_search?pretty" -d \ '{ "query" : { "bool" : { "should" : [ { "match" : { "first_name" : "Clarence"}}, { "match" : { "last_name" : "Fernandez"}}, { "match" : { "country": "uk" }}, { "match" : { "amount": "$9001.19" }}, { "match" : { "password_hash": "Th94hnXtaYtZ" }} ] } } }' I am writing a script to generate as many as possible those match fields but I still want to ask if what I am doing is right? Any comment/opinion is really appreciated. Thanks. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e0eb8437-a629-4e10-85da-9b9da0076c45%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.