Re: Shards allocation in cluster on the same node
On my testing (5 virtual nodes on a single box), I've been observing pretty much the same. I've tried pointing logstash to different nodes in the cluster and getting approx. the same performance. My main objective was to determine if by inserting the data into the cluster on different nodes whether the data would be distributed differently. Surprisingly, I've fond so far there is little difference at the cluster level (distribution of shards and data across nodes) although the details (actual shard locations on which nodes) would be different. In other words, the overall "uneven-ness" of data was surprisingly almost identical with each try although the uneven-ness was typically different. And, I also found that inserting data into more than one node at once didn't seem to make a diff. One possibility is that your two EC2 instances might be running on the same hardware which could explain our similar results? I remember sitting a presentation years ago about this and how that person "encouraged" EC2 to deploy nodes on different hardware. I don't remember the details, I only remember that person determined he couldn't make it a certainty but could tilt the odds so much in his favor (6:1?) that his VMs would usually be on different hardware. Tony On Tuesday, February 18, 2014 12:45:32 PM UTC-8, Bastien Chong wrote: > Even with 2 logstashs instance writing to the same cluster, it's not > faster. > > On Friday, February 14, 2014 2:45:47 PM UTC-5, Binh Ly wrote: >> >> It's hard to diagnose things offline, but is it possible for you to run >> another logstash somewhere else (like maybe on the second box) and both of >> them in parallel and see what your combined ES throughput is. So they would >> be both writing to the same single ES cluster. >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e85d8071-3424-4c0c-a35f-a87a52bad20a%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Shards allocation in cluster on the same node
Even with 2 logstashs instance writing to the same cluster, it's not faster. On Friday, February 14, 2014 2:45:47 PM UTC-5, Binh Ly wrote: > > It's hard to diagnose things offline, but is it possible for you to run > another logstash somewhere else (like maybe on the second box) and both of > them in parallel and see what your combined ES throughput is. So they would > be both writing to the same single ES cluster. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3897f091-3117-4b28-b05f-3b5dcc5c7c4b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Shards allocation in cluster on the same node
It's hard to diagnose things offline, but is it possible for you to run another logstash somewhere else (like maybe on the second box) and both of them in parallel and see what your combined ES throughput is. So they would be both writing to the same single ES cluster. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aa945a7d-532c-4348-bc87-5d37bc7f1cd9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Shards allocation in cluster on the same node
I provisioned an IO 300 disk, no improvement at all. Logstash is running on the same instance as the master node. On Friday, February 14, 2014 2:10:47 PM UTC-5, Bastien Chong wrote: > > I managed to split the shards by restarting ES on the master, then > retested. Throughput is the same. > > 4500/sec seems a bit low, each doc is just 8k. Network doesn't seems to be > the bottleneck. I check the IO on disk, and it's between 0 (probably when > it's buffering before flushing, and 50/70). Do you think I should get > Provisionned IO on my EC2 instance ? > > On Friday, February 14, 2014 12:53:11 PM UTC-5, Binh Ly wrote: >> >> Shards should distribute over the 2 nodes assuming they are part of a >> single cluster. Theoretically, yes more shards *distributed across multiple >> nodes* will increase indexing speed. But you can still be limited by other >> resources such as network, CPU, memory so it's hard to say how much exactly >> will your throughput be. >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a56f8f9c-8b09-4980-9b84-210522fe7300%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Shards allocation in cluster on the same node
I managed to split the shards by restarting ES on the master, then retested. Throughput is the same. 4500/sec seems a bit low, each doc is just 8k. Network doesn't seems to be the bottleneck. I check the IO on disk, and it's between 0 (probably when it's buffering before flushing, and 50/70). Do you think I should get Provisionned IO on my EC2 instance ? On Friday, February 14, 2014 12:53:11 PM UTC-5, Binh Ly wrote: > > Shards should distribute over the 2 nodes assuming they are part of a > single cluster. Theoretically, yes more shards *distributed across multiple > nodes* will increase indexing speed. But you can still be limited by other > resources such as network, CPU, memory so it's hard to say how much exactly > will your throughput be. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fd2c61ca-8e6a-4961-8dfd-ea5c7cb4b563%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Shards allocation in cluster on the same node
Shards should distribute over the 2 nodes assuming they are part of a single cluster. Theoretically, yes more shards *distributed across multiple nodes* will increase indexing speed. But you can still be limited by other resources such as network, CPU, memory so it's hard to say how much exactly will your throughput be. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/292bc034-f884-4373-be5d-5cb87b1fded4%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.