Re: Shards allocation in cluster on the same node

2014-02-18 Thread Tony Su
On my testing (5 virtual nodes on a single box), I've been observing pretty 
much the same.
I've tried pointing logstash to different nodes in the cluster and getting 
approx. the same performance.
 
My main objective was to determine if by inserting the data into the 
cluster on different nodes whether the data would be distributed 
differently. Surprisingly, I've fond so far there is little difference at 
the cluster level (distribution of shards and data across nodes) although 
the details (actual shard locations on which nodes) would be different. In 
other words, the overall "uneven-ness" of data was surprisingly almost 
identical with each try although  the uneven-ness was typically 
different.
 
And, I also found that inserting data into more than one node at once 
didn't seem to make a diff.
 
One possibility is that your two EC2 instances might be running on the same 
hardware which could explain our similar results?
I remember sitting a presentation years ago about this and how that person 
"encouraged" EC2 to deploy nodes on different hardware. I don't remember 
the details, I only remember that person determined he couldn't make it a 
certainty but could tilt the odds so much in his favor (6:1?) that his VMs 
would usually be on different hardware.
 
Tony
 
 
 
 
 

On Tuesday, February 18, 2014 12:45:32 PM UTC-8, Bastien Chong wrote:

> Even with 2 logstashs instance writing to the same cluster, it's not 
> faster.
>
> On Friday, February 14, 2014 2:45:47 PM UTC-5, Binh Ly wrote:
>>
>> It's hard to diagnose things offline, but is it possible for you to run 
>> another logstash somewhere else (like maybe on the second box) and both of 
>> them in parallel and see what your combined ES throughput is. So they would 
>> be both writing to the same single ES cluster.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e85d8071-3424-4c0c-a35f-a87a52bad20a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Shards allocation in cluster on the same node

2014-02-18 Thread Bastien Chong
Even with 2 logstashs instance writing to the same cluster, it's not faster.

On Friday, February 14, 2014 2:45:47 PM UTC-5, Binh Ly wrote:
>
> It's hard to diagnose things offline, but is it possible for you to run 
> another logstash somewhere else (like maybe on the second box) and both of 
> them in parallel and see what your combined ES throughput is. So they would 
> be both writing to the same single ES cluster.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3897f091-3117-4b28-b05f-3b5dcc5c7c4b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Shards allocation in cluster on the same node

2014-02-14 Thread Binh Ly
It's hard to diagnose things offline, but is it possible for you to run 
another logstash somewhere else (like maybe on the second box) and both of 
them in parallel and see what your combined ES throughput is. So they would 
be both writing to the same single ES cluster.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/aa945a7d-532c-4348-bc87-5d37bc7f1cd9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Shards allocation in cluster on the same node

2014-02-14 Thread Bastien Chong
I provisioned an IO 300 disk, no improvement at all.

Logstash is running on the same instance as the master node.

On Friday, February 14, 2014 2:10:47 PM UTC-5, Bastien Chong wrote:
>
> I managed to split the shards by restarting ES on the master, then 
> retested. Throughput is the same.
>
> 4500/sec seems a bit low, each doc is just 8k. Network doesn't seems to be 
> the bottleneck. I check the IO on disk, and it's between 0 (probably when 
> it's buffering before flushing, and 50/70). Do you think I should get 
> Provisionned IO on my EC2 instance ? 
>
> On Friday, February 14, 2014 12:53:11 PM UTC-5, Binh Ly wrote:
>>
>> Shards should distribute over the 2 nodes assuming they are part of a 
>> single cluster. Theoretically, yes more shards *distributed across multiple 
>> nodes* will increase indexing speed. But you can still be limited by other 
>> resources such as network, CPU, memory so it's hard to say how much exactly 
>> will your throughput be.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a56f8f9c-8b09-4980-9b84-210522fe7300%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Shards allocation in cluster on the same node

2014-02-14 Thread Bastien Chong
I managed to split the shards by restarting ES on the master, then 
retested. Throughput is the same.

4500/sec seems a bit low, each doc is just 8k. Network doesn't seems to be 
the bottleneck. I check the IO on disk, and it's between 0 (probably when 
it's buffering before flushing, and 50/70). Do you think I should get 
Provisionned IO on my EC2 instance ? 

On Friday, February 14, 2014 12:53:11 PM UTC-5, Binh Ly wrote:
>
> Shards should distribute over the 2 nodes assuming they are part of a 
> single cluster. Theoretically, yes more shards *distributed across multiple 
> nodes* will increase indexing speed. But you can still be limited by other 
> resources such as network, CPU, memory so it's hard to say how much exactly 
> will your throughput be.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fd2c61ca-8e6a-4961-8dfd-ea5c7cb4b563%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Shards allocation in cluster on the same node

2014-02-14 Thread Binh Ly
Shards should distribute over the 2 nodes assuming they are part of a 
single cluster. Theoretically, yes more shards *distributed across multiple 
nodes* will increase indexing speed. But you can still be limited by other 
resources such as network, CPU, memory so it's hard to say how much exactly 
will your throughput be.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/292bc034-f884-4373-be5d-5cb87b1fded4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Shards allocation in cluster on the same node

2014-02-14 Thread Bastien Chong
I have 2 ES nodes configured with 2 shards, 0 replica. I'm testing how fast 
logstash can push logs from a dummy "access_log" file to this clusters.

>From my test, with m3.xlarge on EC2, I can push around 4500 logs/sec. But I 
noticed that my 2 shards were on the same node. I still don't get how ES 
black magic works, why does it not split the shards ? Will that allows me 
to push 9000/sec ?

I can't find if logstash or ES is the bottleneck there. 

/usr/bin/java -Xms7g -Xmx7g -Xss256k -Djava.awt.headless=true 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch 
-Des.pidfile=/var/run/elasticsearch/elasticsearch.pid 
-Des.path.home=/usr/share/elasticsearch -cp 
:/usr/share/elasticsearch/lib/elasticsearch-0.90.10.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 
-Des.default.path.home=/usr/share/elasticsearch 
-Des.default.path.logs=/var/log/elasticsearch 
-Des.default.path.data=/var/lib/elasticsearch 
-Des.default.path.work=/tmp/elasticsearch 
-Des.default.path.conf=/etc/elasticsearch 
org.elasticsearch.bootstrap.ElasticSearch

/usr/bin/java -Xmx1G -Xms1G -cp 
/usr/local/bin/logstash/logstash.jar:/usr/local/bin/logstash/cloud-aws/* 
logstash.runner agent --config /etc/logstash/mylogstash.conf --log 
/var/log/logstash/logstash.log



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0abf0a40-94c4-4bae-adb4-5ecebc1ebb2a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.