Re: Performance problems with large data volumes

2014-11-06 Thread John D. Ament
How would index aliases help here?

On Wednesday, November 5, 2014 11:50:34 AM UTC-5, Jörg Prante wrote:

 Use index aliases: one physical index, 4000 aliases.

 Jörg

 On Tue, Nov 4, 2014 at 3:42 PM, John D. Ament john.d...@gmail.com 
 javascript: wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up 
 around 2.5 gb, spread across a little more than 4000 indices.  Currently 
 our master node is set for 6gb of ram.  We're seeing that after loading 
 this data the JVM will eventually crash, sometimes in as little as 5 
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cb4e1ae5-8ec7-42ae-b461-ba150ff1da61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-06 Thread joergpra...@gmail.com
See kimchy's explanation

https://groups.google.com/forum/#!msg/elasticsearch/49q-_AgQCp8/MRol0t9asEcJ

Jörg

On Thu, Nov 6, 2014 at 7:08 PM, John D. Ament john.d.am...@gmail.com
wrote:

 How would index aliases help here?

 On Wednesday, November 5, 2014 11:50:34 AM UTC-5, Jörg Prante wrote:

 Use index aliases: one physical index, 4000 aliases.

 Jörg

 On Tue, Nov 4, 2014 at 3:42 PM, John D. Ament john.d...@gmail.com
 wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up
 around 2.5 gb, spread across a little more than 4000 indices.  Currently
 our master node is set for 6gb of ram.  We're seeing that after loading
 this data the JVM will eventually crash, sometimes in as little as 5
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/cb4e1ae5-8ec7-42ae-b461-ba150ff1da61%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/cb4e1ae5-8ec7-42ae-b461-ba150ff1da61%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGu_MfidMrNX1kEiKdUGQDsDCFVbhYn1%2BgKuFE4DFq7kA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-05 Thread Georgi Ivanov
Ok .. so it is Java

1. You are not doing this right .
2. You should use BulkRequest or better BulkProcessor class
3. Do NOT do setRefresh ! This way you are forcing ES to do the real
indexing which will load the cluster a LOT
4. Set the refresh interval of your index to something line 30s or 60s


Here is a snippet of code using BulkProcessor (it will not run , because i
removed some parts but it will give u an idea)



public class IndexFoo {
private Connection connection = null;

public Client client;
Integer bulkSize = 1000;
private CommandLine cmd;
//BulkRequestBuilder bulkRequest;
BulkProcessor bulkRequest;
private String index;
SetString hosts = new HashSetString();

private int threads = 5;


public IndexFoo(CommandLine cmd) throws SQLException, ParseException {
this.cmd = cmd;
 this.index = cmd.getOptionValue(index);
 if (cmd.hasOption(b)) {
this.bulkSize = Integer.valueOf(cmd.getOptionValue(b));
}
 if (cmd.hasOption(t)) {
this.threads = Integer.valueOf(cmd.getOptionValue(t));
}
 if (cmd.hasOption(h)) {
String[] hosts = cmd.getOptionValue(h).split(,);
for (String host : hosts) {
this.hosts.add(host);
}
}

this.connectES();

this.bulkRequest = this.getBulkProcessor();
}



private void processData(ResultSet rs) throws SQLException {
 while (rs.next()) {
 //index
bulkRequest.add(client.prepareIndex(myIndex, mytype,
id.toString()).setSource(mySource).request());

}//while
 this.bulkRequest.close();
System.out.println(Indexing done);

}

 private BulkProcessor getBulkProcessor(){
return BulkProcessor.builder(client, new BulkProcessor.Listener() {
 @Override
public void beforeBulk(long executionId, BulkRequest request) {
//System.out.println(Executing bulk #+executionId+
+request.numberOfActions());
}
 @Override
public void afterBulk(long executionId, BulkRequest request, Throwable
failure) {
}
 @Override
public void afterBulk(long executionId, BulkRequest request, BulkResponse
response) {
System.out.println(Bulk #+executionId+/+request.numberOfActions()+
executed in +response.getTook().secondsFrac()+ sec.);
if (response.hasFailures()) {
for (BulkItemResponse bulkItemResponse : response.getItems()) {
if (bulkItemResponse.isFailed()){
System.err.println(Failure message : +
bulkItemResponse.getFailureMessage());
}
}
System.exit(-1);
}
}
}).setConcurrentRequests(this.threads
).setBulkActions(this.bulkSize).build();
 }


}


2014-11-04 17:53 GMT+01:00 John D. Ament john.d.am...@gmail.com:

 And actually now that I'm looking at it again - I wanted to ask why I need
 to use setRefresh(true)?

 In my case, we were not seeing index data updated quick enough upon
 indexing a record.  setting refresh = true was doing it for us.  If there's
 a way to avoid it, that might help me here?


 On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote:

 Georgi,

 I'm indexing the data through regular index request via java

 final IndexResponse response = esClient.client().prepareIndex(indexName,
 type)
 .setSource(json).setRefresh(
 true).execute().actionGet();

 json in this case is a byte[] with the json data in it.

 The requests come in via multiple HTTP requests, but I'm not leveraging
 any specific multithreading within the ES client.  I hope this helps, I'm
 not 100% sure what information would help identify.

 John

 On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

 So you run OOM when you index data ?
 If so :
 How do you index the data ?
 Are you using BulkRequest ?
 Which programming language are you using ?
 Are you using multiple threads to index ?

 If you are using Bulk request , you should limit the size of the bulk.
 You can also tune the bulk request pool in ES.

 In general, you are very brief in describing you problem :)

 Georgi


 2014-11-04 17:05 GMT+01:00 John D. Ament john.d...@gmail.com:

 Georgi,

 Thanks for the quick reply!

 I have 4k indices.  We're creating an index per tenant.  In this
 environment we've created 4k tenants.

 We're running out of memory just letting the loading of records run.

 John


 On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

 Hi,
 I don't think 24k documents are large data.
 What is strange for me is 4000 indices.
 This is strange .. how many indices do you need ?

 On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

 When you are running OOM ? Example query(ies) ? How my nodes ? Some
 more info please :)

 Also, 6GB Heap is not too much, but that depends on your use case

 Georgi

 On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking
 up around 2.5 gb, spread across a little more than 4000 indices.  
 Currently
 our master node is set for 6gb of ram.  We're seeing that after loading
 this data the JVM will eventually crash, sometimes in as little as 5
 minutes.

 Is this not enough horse power for this data set?


Re: Performance problems with large data volumes

2014-11-05 Thread John D. Ament
Hi,

I doubt the issue is that I'm not using bulk requests.  My requests come in 
one at a time, not in bulk.  If you can explain why bulk is required that 
would help.

I can believe that the refresh is causing the issue.  I would prefer to 
test that one by itself.  How do I configure the refresh interval on the 
index?

John

On Wednesday, November 5, 2014 3:43:37 AM UTC-5, Georgi Ivanov wrote:

 Ok .. so it is Java 

 1. You are not doing this right .
 2. You should use BulkRequest or better BulkProcessor class
 3. Do NOT do setRefresh ! This way you are forcing ES to do the real 
 indexing which will load the cluster a LOT
 4. Set the refresh interval of your index to something line 30s or 60s


 Here is a snippet of code using BulkProcessor (it will not run , because i 
 removed some parts but it will give u an idea)



 public class IndexFoo {
 private Connection connection = null;

 public Client client;
 Integer bulkSize = 1000;
 private CommandLine cmd;
 //BulkRequestBuilder bulkRequest;
 BulkProcessor bulkRequest;
 private String index;
 SetString hosts = new HashSetString();

 private int threads = 5;


 public IndexFoo(CommandLine cmd) throws SQLException, ParseException {
 this.cmd = cmd;
  this.index = cmd.getOptionValue(index);
  if (cmd.hasOption(b)) {
 this.bulkSize = Integer.valueOf(cmd.getOptionValue(b));
 }
  if (cmd.hasOption(t)) {
 this.threads = Integer.valueOf(cmd.getOptionValue(t));
 }
  if (cmd.hasOption(h)) {
 String[] hosts = cmd.getOptionValue(h).split(,);
 for (String host : hosts) {
 this.hosts.add(host);
 }
 }

 this.connectES();

 this.bulkRequest = this.getBulkProcessor();
 }



 private void processData(ResultSet rs) throws SQLException {
  while (rs.next()) {
  //index 
 bulkRequest.add(client.prepareIndex(myIndex, mytype, 
 id.toString()).setSource(mySource).request());

 }//while
  this.bulkRequest.close();
 System.out.println(Indexing done);

 }

  private BulkProcessor getBulkProcessor(){
 return BulkProcessor.builder(client, new BulkProcessor.Listener() {
  @Override
 public void beforeBulk(long executionId, BulkRequest request) {
 //System.out.println(Executing bulk #+executionId+ 
 +request.numberOfActions());
 }
  @Override
 public void afterBulk(long executionId, BulkRequest request, Throwable 
 failure) {
 }
  @Override
 public void afterBulk(long executionId, BulkRequest request, BulkResponse 
 response) {
 System.out.println(Bulk #+executionId+/+request.numberOfActions()+ 
 executed in +response.getTook().secondsFrac()+ sec.);
 if (response.hasFailures()) {
 for (BulkItemResponse bulkItemResponse : response.getItems()) {
 if (bulkItemResponse.isFailed()){
 System.err.println(Failure message : + 
 bulkItemResponse.getFailureMessage());
 }
 }
 System.exit(-1);
 }
 }
 }).setConcurrentRequests(this.threads 
 ).setBulkActions(this.bulkSize).build();
  }


 }


 2014-11-04 17:53 GMT+01:00 John D. Ament john.d...@gmail.com 
 javascript::

 And actually now that I'm looking at it again - I wanted to ask why I 
 need to use setRefresh(true)?

 In my case, we were not seeing index data updated quick enough upon 
 indexing a record.  setting refresh = true was doing it for us.  If there's 
 a way to avoid it, that might help me here?


 On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote:

 Georgi,

 I'm indexing the data through regular index request via java

 final IndexResponse response = esClient.client().prepareIndex(indexName, 
 type)
 .setSource(json).setRefresh(
 true).execute().actionGet();

 json in this case is a byte[] with the json data in it.  

 The requests come in via multiple HTTP requests, but I'm not leveraging 
 any specific multithreading within the ES client.  I hope this helps, I'm 
 not 100% sure what information would help identify.

 John

 On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

 So you run OOM when you index data ?
 If so :
 How do you index the data ?
 Are you using BulkRequest ? 
 Which programming language are you using ?
 Are you using multiple threads to index ?

 If you are using Bulk request , you should limit the size of the bulk.
 You can also tune the bulk request pool in ES.

 In general, you are very brief in describing you problem :)

 Georgi


 2014-11-04 17:05 GMT+01:00 John D. Ament john.d...@gmail.com:

 Georgi,

 Thanks for the quick reply!

 I have 4k indices.  We're creating an index per tenant.  In this 
 environment we've created 4k tenants.

 We're running out of memory just letting the loading of records run.

 John


 On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

 Hi,
 I don't think 24k documents are large data.
 What is strange for me is 4000 indices. 
 This is strange .. how many indices do you need ?

 On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

 When you are running OOM ? Example query(ies) ? How my nodes ? Some 
 more info please :)

 Also, 6GB Heap is not too much, but that depends on your 

Re: Performance problems with large data volumes

2014-11-05 Thread Georgi Ivanov
Here is how to set refresh interval:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-update-settings.html
When you force refresh after every document, you are putting unnecessary
load to ES.

Indexing single document in a single call is completely fine, but is also
very slow and inefficient :)
This way you are also utilizing the available indexing threads in ES. You
can read the documentation about this here :
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
If you use bulk request , you can index (tens)thousands of docs per second,
depending on your hardware.

With BulkProcessor class you can set how many threads will run, how may
document will be sent in one bulk.. etc.
It is much more efficient then indexing single document.


2014-11-05 12:53 GMT+01:00 John D. Ament john.d.am...@gmail.com:

 Hi,

 I doubt the issue is that I'm not using bulk requests.  My requests come
 in one at a time, not in bulk.  If you can explain why bulk is required
 that would help.

 I can believe that the refresh is causing the issue.  I would prefer to
 test that one by itself.  How do I configure the refresh interval on the
 index?

 John


 On Wednesday, November 5, 2014 3:43:37 AM UTC-5, Georgi Ivanov wrote:

 Ok .. so it is Java

 1. You are not doing this right .
 2. You should use BulkRequest or better BulkProcessor class
 3. Do NOT do setRefresh ! This way you are forcing ES to do the real
 indexing which will load the cluster a LOT
 4. Set the refresh interval of your index to something line 30s or 60s


 Here is a snippet of code using BulkProcessor (it will not run , because
 i removed some parts but it will give u an idea)



 public class IndexFoo {
 private Connection connection = null;

 public Client client;
 Integer bulkSize = 1000;
 private CommandLine cmd;
 //BulkRequestBuilder bulkRequest;
 BulkProcessor bulkRequest;
 private String index;
 SetString hosts = new HashSetString();

 private int threads = 5;


 public IndexFoo(CommandLine cmd) throws SQLException, ParseException {
 this.cmd = cmd;
  this.index = cmd.getOptionValue(index);
  if (cmd.hasOption(b)) {
 this.bulkSize = Integer.valueOf(cmd.getOptionValue(b));
 }
  if (cmd.hasOption(t)) {
 this.threads = Integer.valueOf(cmd.getOptionValue(t));
 }
  if (cmd.hasOption(h)) {
 String[] hosts = cmd.getOptionValue(h).split(,);
 for (String host : hosts) {
 this.hosts.add(host);
 }
 }

 this.connectES();

 this.bulkRequest = this.getBulkProcessor();
 }



 private void processData(ResultSet rs) throws SQLException {
  while (rs.next()) {
  //index
 bulkRequest.add(client.prepareIndex(myIndex, mytype,
 id.toString()).setSource(mySource).request());

 }//while
  this.bulkRequest.close();
 System.out.println(Indexing done);

 }

  private BulkProcessor getBulkProcessor(){
 return BulkProcessor.builder(client, new BulkProcessor.Listener() {
  @Override
 public void beforeBulk(long executionId, BulkRequest request) {
 //System.out.println(Executing bulk #+executionId+
 +request.numberOfActions());
 }
  @Override
 public void afterBulk(long executionId, BulkRequest request, Throwable
 failure) {
 }
  @Override
 public void afterBulk(long executionId, BulkRequest request, BulkResponse
 response) {
 System.out.println(Bulk #+executionId+/+request.numberOfActions()+
 executed in +response.getTook().secondsFrac()+ sec.);
 if (response.hasFailures()) {
 for (BulkItemResponse bulkItemResponse : response.getItems()) {
 if (bulkItemResponse.isFailed()){
 System.err.println(Failure message : + bulkItemResponse.
 getFailureMessage());
 }
 }
 System.exit(-1);
 }
 }
 }).setConcurrentRequests(this.threads ).setBulkActions(this.
 bulkSize).build();
  }


 }


 2014-11-04 17:53 GMT+01:00 John D. Ament john.d...@gmail.com:

 And actually now that I'm looking at it again - I wanted to ask why I
 need to use setRefresh(true)?

 In my case, we were not seeing index data updated quick enough upon
 indexing a record.  setting refresh = true was doing it for us.  If there's
 a way to avoid it, that might help me here?


 On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote:

 Georgi,

 I'm indexing the data through regular index request via java

 final IndexResponse response = esClient.client().prepareIndex(indexName,
 type)
 .setSource(json).setRefresh(tr
 ue).execute().actionGet();

 json in this case is a byte[] with the json data in it.

 The requests come in via multiple HTTP requests, but I'm not leveraging
 any specific multithreading within the ES client.  I hope this helps, I'm
 not 100% sure what information would help identify.

 John

 On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

 So you run OOM when you index data ?
 If so :
 How do you index the data ?
 Are you using BulkRequest ?
 Which programming language are you using ?
 Are you using multiple threads to index ?

 If you are using Bulk request , you should limit the size of the 

Re: Performance problems with large data volumes

2014-11-05 Thread joergpra...@gmail.com
Use index aliases: one physical index, 4000 aliases.

Jörg

On Tue, Nov 4, 2014 at 3:42 PM, John D. Ament john.d.am...@gmail.com
wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up
 around 2.5 gb, spread across a little more than 4000 indices.  Currently
 our master node is set for 6gb of ram.  We're seeing that after loading
 this data the JVM will eventually crash, sometimes in as little as 5
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGTw8zWjptswHxi6w2yTnwU4zXtB9jHLGOCNOf9ZW%2B27A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Performance problems with large data volumes

2014-11-04 Thread John D. Ament
Hi,

So I have what you might want to consider a large set of data.

We have about 25k records in our index, and the disk space is taking up 
around 2.5 gb, spread across a little more than 4000 indices.  Currently 
our master node is set for 6gb of ram.  We're seeing that after loading 
this data the JVM will eventually crash, sometimes in as little as 5 
minutes.

Is this not enough horse power for this data set?

What could be tuned to resolve this?

John

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a26bf849-8e92-4e10-81d3-88be97bd9c43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-04 Thread Georgi Ivanov
Hi,
I don't think 24k documents are large data.
What is strange for me is 4000 indices. 
This is strange .. how many indices do you need ?

On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

When you are running OOM ? Example query(ies) ? How my nodes ? Some more 
info please :)

Also, 6GB Heap is not too much, but that depends on your use case

Georgi

On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up 
 around 2.5 gb, spread across a little more than 4000 indices.  Currently 
 our master node is set for 6gb of ram.  We're seeing that after loading 
 this data the JVM will eventually crash, sometimes in as little as 5 
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ee8b784c-2fd5-403d-853e-5a1e893831dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-04 Thread John D. Ament
Georgi,

Thanks for the quick reply!

I have 4k indices.  We're creating an index per tenant.  In this 
environment we've created 4k tenants.

We're running out of memory just letting the loading of records run.

John

On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

 Hi,
 I don't think 24k documents are large data.
 What is strange for me is 4000 indices. 
 This is strange .. how many indices do you need ?

 On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

 When you are running OOM ? Example query(ies) ? How my nodes ? Some more 
 info please :)

 Also, 6GB Heap is not too much, but that depends on your use case

 Georgi

 On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up 
 around 2.5 gb, spread across a little more than 4000 indices.  Currently 
 our master node is set for 6gb of ram.  We're seeing that after loading 
 this data the JVM will eventually crash, sometimes in as little as 5 
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-04 Thread Georgi Ivanov
So you run OOM when you index data ?
If so :
How do you index the data ?
Are you using BulkRequest ?
Which programming language are you using ?
Are you using multiple threads to index ?

If you are using Bulk request , you should limit the size of the bulk.
You can also tune the bulk request pool in ES.

In general, you are very brief in describing you problem :)

Georgi


2014-11-04 17:05 GMT+01:00 John D. Ament john.d.am...@gmail.com:

 Georgi,

 Thanks for the quick reply!

 I have 4k indices.  We're creating an index per tenant.  In this
 environment we've created 4k tenants.

 We're running out of memory just letting the loading of records run.

 John


 On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

 Hi,
 I don't think 24k documents are large data.
 What is strange for me is 4000 indices.
 This is strange .. how many indices do you need ?

 On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

 When you are running OOM ? Example query(ies) ? How my nodes ? Some more
 info please :)

 Also, 6GB Heap is not too much, but that depends on your use case

 Georgi

 On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up
 around 2.5 gb, spread across a little more than 4000 indices.  Currently
 our master node is set for 6gb of ram.  We're seeing that after loading
 this data the JVM will eventually crash, sometimes in as little as 5
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/cJ2Y6-KQZus/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGKxwgkMCyBkXwg3MNhQp0hqGT6Czz3R2RPBC73B56Bo-yg-dA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-04 Thread John D. Ament
Georgi,

I'm indexing the data through regular index request via java

final IndexResponse response = esClient.client().prepareIndex(indexName, 
type)
.setSource(json).setRefresh(true).execute().actionGet();

json in this case is a byte[] with the json data in it.  

The requests come in via multiple HTTP requests, but I'm not leveraging any 
specific multithreading within the ES client.  I hope this helps, I'm not 
100% sure what information would help identify.

John

On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

 So you run OOM when you index data ?
 If so :
 How do you index the data ?
 Are you using BulkRequest ? 
 Which programming language are you using ?
 Are you using multiple threads to index ?

 If you are using Bulk request , you should limit the size of the bulk.
 You can also tune the bulk request pool in ES.

 In general, you are very brief in describing you problem :)

 Georgi


 2014-11-04 17:05 GMT+01:00 John D. Ament john.d...@gmail.com 
 javascript::

 Georgi,

 Thanks for the quick reply!

 I have 4k indices.  We're creating an index per tenant.  In this 
 environment we've created 4k tenants.

 We're running out of memory just letting the loading of records run.

 John


 On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

 Hi,
 I don't think 24k documents are large data.
 What is strange for me is 4000 indices. 
 This is strange .. how many indices do you need ?

 On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

 When you are running OOM ? Example query(ies) ? How my nodes ? Some more 
 info please :)

 Also, 6GB Heap is not too much, but that depends on your use case

 Georgi

 On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking up 
 around 2.5 gb, spread across a little more than 4000 indices.  Currently 
 our master node is set for 6gb of ram.  We're seeing that after loading 
 this data the JVM will eventually crash, sometimes in as little as 5 
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John

  -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit 
 https://groups.google.com/d/topic/elasticsearch/cJ2Y6-KQZus/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dae58106-8954-4413-88a0-01dc40b99972%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance problems with large data volumes

2014-11-04 Thread John D. Ament
And actually now that I'm looking at it again - I wanted to ask why I need 
to use setRefresh(true)?

In my case, we were not seeing index data updated quick enough upon 
indexing a record.  setting refresh = true was doing it for us.  If there's 
a way to avoid it, that might help me here?

On Tuesday, November 4, 2014 11:37:46 AM UTC-5, John D. Ament wrote:

 Georgi,

 I'm indexing the data through regular index request via java

 final IndexResponse response = esClient.client().prepareIndex(indexName, 
 type)
 
 .setSource(json).setRefresh(true).execute().actionGet();

 json in this case is a byte[] with the json data in it.  

 The requests come in via multiple HTTP requests, but I'm not leveraging 
 any specific multithreading within the ES client.  I hope this helps, I'm 
 not 100% sure what information would help identify.

 John

 On Tuesday, November 4, 2014 11:35:06 AM UTC-5, Georgi Ivanov wrote:

 So you run OOM when you index data ?
 If so :
 How do you index the data ?
 Are you using BulkRequest ? 
 Which programming language are you using ?
 Are you using multiple threads to index ?

 If you are using Bulk request , you should limit the size of the bulk.
 You can also tune the bulk request pool in ES.

 In general, you are very brief in describing you problem :)

 Georgi


 2014-11-04 17:05 GMT+01:00 John D. Ament john.d...@gmail.com:

 Georgi,

 Thanks for the quick reply!

 I have 4k indices.  We're creating an index per tenant.  In this 
 environment we've created 4k tenants.

 We're running out of memory just letting the loading of records run.

 John


 On Tuesday, November 4, 2014 10:15:15 AM UTC-5, Georgi Ivanov wrote:

 Hi,
 I don't think 24k documents are large data.
 What is strange for me is 4000 indices. 
 This is strange .. how many indices do you need ?

 On my cluster i have : Nodes: 8 Indices: 89 Shards: 2070 Data: 4.87 TB

 When you are running OOM ? Example query(ies) ? How my nodes ? Some 
 more info please :)

 Also, 6GB Heap is not too much, but that depends on your use case

 Georgi

 On Tuesday, November 4, 2014 3:42:19 PM UTC+1, John D. Ament wrote:

 Hi,

 So I have what you might want to consider a large set of data.

 We have about 25k records in our index, and the disk space is taking 
 up around 2.5 gb, spread across a little more than 4000 indices.  
 Currently 
 our master node is set for 6gb of ram.  We're seeing that after loading 
 this data the JVM will eventually crash, sometimes in as little as 5 
 minutes.

 Is this not enough horse power for this data set?

 What could be tuned to resolve this?

 John

  -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit 
 https://groups.google.com/d/topic/elasticsearch/cJ2Y6-KQZus/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/c3125935-f4d8-4671-a9df-222433369f2b%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/19f4f9a3-0f57-4521-ade6-d6cb9cfc4d25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.