Re: optimize elasticsearch / JVM

2015-01-29 Thread Oto Iashvili
why not ? Could u tell me how to do such ? and also explain why will it be 
better ?

thanks a lot for your help

On Thursday, January 29, 2015 at 10:02:00 AM UTC+1, Arie wrote:

 Just an idea.

 You could try running two ES instances as a cluster on one machine if 
 there is no other option.

 On Wednesday, January 28, 2015 at 2:09:22 PM UTC+1, Oto Iashvili wrote:

 Hi

 I have a website for classified. For this I'm using elasticsearch, 
 postgres and rails on a same ubuntu 14.04 dedicated server, with 256go of 
 RAM and 20 cores, 40 threads .

 I have 10 indexes on elasticsearch, each have default numbers of shardes 
 (5). They have between 1000 and 400 000 classifieds dependings on which 
 index.
 approximatly 5000 requests per minute, 2/3 making an elasticsearch 
 request.

 according to htop, jvm is using around 500% of CPU 
 I try different options, I reduce number of shardes per index, I also try 
 to change JAVA_OPTS of followed

 #JAVA_OPTS=$JAVA_OPTS -XX:+UseParNewGC
 #JAVA_OPTS=$JAVA_OPTS -XX:+UseConcMarkSweepGC
 
 #JAVA_OPTS=$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75
 #JAVA_OPTS=$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly
 
 JAVA_OPTS=$JAVA_OPTS -XX:+UseG1GC

 but it doesnt seems to change anything.

 so to questions :
 - when you change any setting on elasticsearch,  and then restart, should 
 the improvement (if any) be visible immediatly or can it arrive a bit later 
 thanks to cache or any thing else ?
 - can any one help me to find good configuration for JVM / elasticsearch 
 so it will not take that many ressources



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9db68f64-e79d-4592-9085-0633eec7360f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


optimize elasticsearch / JVM

2015-01-28 Thread Oto Iashvili
Hi

I have a website for classified. For this I'm using elasticsearch, postgres 
and rails on a same ubuntu 14.04 dedicated server, with 256go of RAM and 20 
cores, 40 threads .

I have 10 indexes on elasticsearch, each have default numbers of shardes 
(5). They have between 1000 and 400 000 classifieds dependings on which 
index.
approximatly 5000 requests per minute, 2/3 making an elasticsearch request.

according to htop, jvm is using around 500% of CPU 
I try different options, I reduce number of shardes per index, I also try 
to change JAVA_OPTS of followed

#JAVA_OPTS=$JAVA_OPTS -XX:+UseParNewGC
#JAVA_OPTS=$JAVA_OPTS -XX:+UseConcMarkSweepGC

#JAVA_OPTS=$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75
#JAVA_OPTS=$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly

JAVA_OPTS=$JAVA_OPTS -XX:+UseG1GC

but it doesnt seems to change anything.

so to questions :
- when you change any setting on elasticsearch,  and then restart, should 
the improvement (if any) be visible immediatly or can it arrive a bit later 
thanks to cache or any thing else ?
- can any one help me to find good configuration for JVM / elasticsearch so 
it will not take that many ressources

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/369fdad8-cc02-415c-b4e9-e93135e58b59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: optimize elasticsearch / JVM

2015-01-28 Thread Oto Iashvili
Hi,

thanks a lot for answer,
Ive tried several value for heap, between 26 and 32, but I didnt see any 
difference.

I remove G1 and put back default parameter. But still same pb.
I was saying around 500%, but it is just average, i goes sometimes up to 
2000%

I was also thinking to take several server, but right now it is not 
possible.

I was using a smaller server just before, with 96go of ram and I was 
working better. I trie to put same parameters as before, but it is not much 
slower.

On Wednesday, January 28, 2015 at 2:36:29 PM UTC+1, Jilles van Gurp wrote:

 How much heap are you giving to ES? With this many requests, if your setup 
 is not falling over it is probably not garbage collect related because that 
 would result in very noticable delays/unavailability of es.  32GB should be 
 a good value given how much memory you have. Also, you probably want to use 
 doc_values in your mapping so that you can utilize the os file cache and 
 move some of the memory pressure on the heap. You seem to have plenty of 
 ram, so your entire dataset should easily fit in RAM.

 Also, don't use G1 for elasticsearch. There are known issues with that 
 particular garbage collector in combination with lucene. CMS is the best 
 option for ES.

 500% of 20 cores doesn't sound that bad; you'd max them out at 4000%. 
 Still, it would be nice to know what it is doing. In any case, you might 
 want to try out marvel to find out where your setup is bottlenecked.  Also, 
 you might want to consider scaling horizontally instead of vertically. Many 
 smaller servers can be nicer than one big one.

 On Wednesday, January 28, 2015 at 2:09:22 PM UTC+1, Oto Iashvili wrote:

 Hi

 I have a website for classified. For this I'm using elasticsearch, 
 postgres and rails on a same ubuntu 14.04 dedicated server, with 256go of 
 RAM and 20 cores, 40 threads .

 I have 10 indexes on elasticsearch, each have default numbers of shardes 
 (5). They have between 1000 and 400 000 classifieds dependings on which 
 index.
 approximatly 5000 requests per minute, 2/3 making an elasticsearch 
 request.

 according to htop, jvm is using around 500% of CPU 
 I try different options, I reduce number of shardes per index, I also try 
 to change JAVA_OPTS of followed

 #JAVA_OPTS=$JAVA_OPTS -XX:+UseParNewGC
 #JAVA_OPTS=$JAVA_OPTS -XX:+UseConcMarkSweepGC
 
 #JAVA_OPTS=$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75
 #JAVA_OPTS=$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly
 
 JAVA_OPTS=$JAVA_OPTS -XX:+UseG1GC

 but it doesnt seems to change anything.

 so to questions :
 - when you change any setting on elasticsearch,  and then restart, should 
 the improvement (if any) be visible immediatly or can it arrive a bit later 
 thanks to cache or any thing else ?
 - can any one help me to find good configuration for JVM / elasticsearch 
 so it will not take that many ressources



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da7e20bf-c8b9-43ed-a95e-49ec32b3660c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: optimize elasticsearch / JVM

2015-01-28 Thread Oto Iashvili
,
  free_in_bytes : 264467443712,
  available_in_bytes : 249068793856,
  disk_reads : 351417,
  disk_writes : 205904,
  disk_io_op : 557321,
  disk_read_size_in_bytes : 3433067520,
  disk_write_size_in_bytes : 3127025664,
  disk_io_size_in_bytes : 6560093184,
  disk_queue : 0,
  disk_service_time : 0.1
},
data : [ {
  path : /var/lib/elasticsearch/elasticsearch/nodes/0,
  mount : /,
  dev : /dev/sda2,
  total_in_bytes : 302674501632,
  free_in_bytes : 264467443712,
  available_in_bytes : 249068793856,
  disk_reads : 351417,
  disk_writes : 205904,
  disk_io_op : 557321,
  disk_read_size_in_bytes : 3433067520,
  disk_write_size_in_bytes : 3127025664,
  disk_io_size_in_bytes : 6560093184,
  disk_queue : 0,
  disk_service_time : 0.1
} ]
  },
  transport : {
server_open : 13,
rx_count : 6,
rx_size_in_bytes : 1380,
tx_count : 6,
tx_size_in_bytes : 1380
  },
  http : {
current_open : 11,
total_opened : 2311818
  },
  breakers : {
request : {
  limit_size_in_bytes : 12357704089,
  limit_size : 11.5gb,
  estimated_size_in_bytes : 16440,
  estimated_size : 16kb,
  overhead : 1.0,
  tripped : 0
},
fielddata : {
  limit_size_in_bytes : 18536556134,
  limit_size : 17.2gb,
  estimated_size_in_bytes : 6131132,
  estimated_size : 5.8mb,
  overhead : 1.03,
  tripped : 0
},
parent : {
  limit_size_in_bytes : 21625982156,
  limit_size : 20.1gb,
  estimated_size_in_bytes : 6147572,
  estimated_size : 5.8mb,
  overhead : 1.0,
  tripped : 0
}
  }
}
  }
}



On Wednesday, January 28, 2015 at 11:41:16 PM UTC+1, Oto Iashvili wrote:

 Hi,

 thanks a lot for answer,
 Ive tried several value for heap, between 26 and 32, but I didnt see any 
 difference.

 I remove G1 and put back default parameter. But still same pb.
 I was saying around 500%, but it is just average, i goes sometimes up to 
 2000%

 I was also thinking to take several server, but right now it is not 
 possible.

 I was using a smaller server just before, with 96go of ram and I was 
 working better. I trie to put same parameters as before, but it is not much 
 slower.

 On Wednesday, January 28, 2015 at 2:36:29 PM UTC+1, Jilles van Gurp wrote:

 How much heap are you giving to ES? With this many requests, if your 
 setup is not falling over it is probably not garbage collect related 
 because that would result in very noticable delays/unavailability of es. 
  32GB should be a good value given how much memory you have. Also, you 
 probably want to use doc_values in your mapping so that you can utilize the 
 os file cache and move some of the memory pressure on the heap. You seem to 
 have plenty of ram, so your entire dataset should easily fit in RAM.

 Also, don't use G1 for elasticsearch. There are known issues with that 
 particular garbage collector in combination with lucene. CMS is the best 
 option for ES.

 500% of 20 cores doesn't sound that bad; you'd max them out at 4000%. 
 Still, it would be nice to know what it is doing. In any case, you might 
 want to try out marvel to find out where your setup is bottlenecked.  Also, 
 you might want to consider scaling horizontally instead of vertically. Many 
 smaller servers can be nicer than one big one.

 On Wednesday, January 28, 2015 at 2:09:22 PM UTC+1, Oto Iashvili wrote:

 Hi

 I have a website for classified. For this I'm using elasticsearch, 
 postgres and rails on a same ubuntu 14.04 dedicated server, with 256go of 
 RAM and 20 cores, 40 threads .

 I have 10 indexes on elasticsearch, each have default numbers of shardes 
 (5). They have between 1000 and 400 000 classifieds dependings on which 
 index.
 approximatly 5000 requests per minute, 2/3 making an elasticsearch 
 request.

 according to htop, jvm is using around 500% of CPU 
 I try different options, I reduce number of shardes per index, I also 
 try to change JAVA_OPTS of followed

 #JAVA_OPTS=$JAVA_OPTS -XX:+UseParNewGC
 #JAVA_OPTS=$JAVA_OPTS -XX:+UseConcMarkSweepGC
 
 #JAVA_OPTS=$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75
 #JAVA_OPTS=$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly
 
 JAVA_OPTS=$JAVA_OPTS -XX:+UseG1GC

 but it doesnt seems to change anything.

 so to questions :
 - when you change any setting on elasticsearch,  and then restart, 
 should the improvement (if any) be visible immediatly or can it arrive a 
 bit later thanks to cache or any thing else ?
 - can any one help me to find good configuration for JVM / elasticsearch 
 so it will not take that many ressources



-- 
You received this message because you

snowball and elusion

2014-06-05 Thread Oto Iashvili
Hello, 

At first, I was using the analyzer language analyzer and everything 
seemed to work very well. Until I realize that a is not part of the list 
of stopwords in french 

So I decided to test with snowball. It also seemed working well, but in 
this case it does remove short word like  l' ,  d' , ... 

Hence my question: How to use snowball, keep filters by default, and add a 
list of stopwords and elusion? 

Otherwise, how to change the list of stopwords for analyzer language 
analyzer? 

And one last question: is there really an interest to use snowball rather 
than the analyzer language analyzer? is it faster? more relevant? 

thank you

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/56de95ea-bb68-42a0-889f-5d34bef4dcf2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


custom stemmer with elasticsearch / tire / rails

2014-05-08 Thread Oto Iashvili
Hi,

Im' searchinkg to ass new stemmer to elastisearch to use with tire / rails

I've found java file 
(https://github.com/emilis/PolicyFeed/blob/master/src/search/java/org/tartarus/snowball/ext/LithuanianStemmer.java)
 
I've created a jar from this file
I've put it in elasticsearch's lib folder

here my rails file


tire.settings :analysis = {
:filter = {
  lt_stemmer = {
type = stemmer,
name = lithuanian,
rules_path = lt_stemmer.jar
  }
},
:analyzer = {
  lithuanian = {
  type = snowball,
  tokenizer = keyword,
  filter = [lowercase, lt_stemmer]
},
},
  } do
  mapping do
indexes :titre_lt, :analyzer = lithuanian

  end



I succeed them to create index and index data, but when I test, it seems it 
doesn't use the rule in my jar file.

curl -XGET 'localhost:9200/lituanieindex/_analyze?analyzer=lithuanian' -d 
'smulkių, dalinių, pilnų krovinių pervežimas nuosavais arba partnerių 
vilkikais su standartinėmis 92 m3 puspriekabėmis ir 120 m3 autotraukiniais;'



{tokens:[{token:smulkių,start_offset:0,end_offset:7,type:ALPHANUM,position:1},{token:dalinių,start_offset:9,end_offset:16,type:ALPHANUM,position:2},{token:pilnų,start_offset:18,end_offset:23,type:ALPHANUM,position:3},{token:krovinių,start_offset:24,end_offset:32,type:ALPHANUM,position:4},{token:pervežima,start_offset:33,end_offset:43,type:ALPHANUM,position:5},{token:nuosavai,start_offset:44,end_offset:53,type:ALPHANUM,position:6},{token:arba,start_offset:54,end_offset:58,type:ALPHANUM,position:7},{token:partnerių,start_offset:59,end_offset:68,type:ALPHANUM,position:8},{token:vilkikai,start_offset:69,end_offset:78,type:ALPHANUM,position:9},{token:su,start_offset:79,end_offset:81,type:ALPHANUM,position:10},{token:standartinėmi,start_offset:82,end_offset:96,type:ALPHANUM,position:11},{token:92,start_offset:97,end_offset:99,type:NUM,position:12},{token:m3,start_offset:100,end_offset:102,type:ALPHANUM,position:13},{token:puspriekabėmi,start_offset:103,end_offset:117,type:ALPHANUM,position:14},{token:ir,start_offset:118,end_offset:120,type:ALPHANUM,position:15},{token:120,start_offset:121,end_offset:124,type:NUM,position:16},{token:m3,start_offset:125,end_offset:127,type:ALPHANUM,position:17},{token:autotraukiniai,start_offset:128,end_offset:143,type:ALPHANUM,position:18}]}

what do I do wrong ?

thanks for help

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c4bd01c5-832a-42b4-8218-8263ca284f25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.