Re: Any experience with ES and Data Compressing Filesystems?
Hi, gzip/zlib compression is very bad for performance, so it can be interesting for closed indices, but for live data I would not recommend it. Also, you must know that: Compression using lz4 is already enabled into indices, ES/Lucene/Java usually readwrite 4k blocks, - hence, compression is achieved on 4k blocks. If your filesystem uses 4k blocks and you add FS compression, you will probably have a very small gain, if any. I've tried on ZFS: Filesystem SizeUsed Avail Capacity Mounted on zdata/ES-lz4 1.1T1.9G1.1T 0%/zdata/ES-lz4 zdata/ES 1.1T1.9G1.1T 0%/zdata/ES If you are using a larger block size, like 128k, a compressed filesystem does show some benefit: Filesystem SizeUsed Avail Capacity Mounted on zdata/ES-lz4 1.1T1.1G1.1T 0%/zdata/ES-lz4 - compressratio 1.73x zdata/ES-gzip 1.1T901M1.1T 0%/zdata/ES-gzip- compressratio 2.27x zdata/ES 1.1T1.9G1.1T 0%/zdata/ES But a file system block larger than 4k is very suboptimal for IO (ES read or write one 4k block - your FS must read or write a 128k block). On 21 juil. 2014, at 07:58, horst knete baduncl...@hotmail.de wrote: Hey guys, we have mounted an btrfs file system with the compression method zlib for testing purposes on our elasticsearchserver and copied one of the indices on the btrfs volume, unfortunately it had no success and still got the size of 50gb :/ I will further try it with other compression methods and will report here Am Samstag, 19. Juli 2014 07:21:20 UTC+2 schrieb Otis Gospodnetic: Hi Horst, I wouldn't bother with this for the reasons Joerg mentioned, but should you try it anyway, I'd love to hear your findings/observations. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wednesday, July 16, 2014 6:56:36 AM UTC-4, horst knete wrote: Hey Guys, to save a lot of hard disk space, we are going to use an compression file system, which allows us transparent compression for the es-indices. (It seems like es-indices are very good compressable, got up to 65% compression-rate in some tests). Currently the indices are laying at a ext4-Linux Filesystem which unfortunately dont have the transparent compression ability. Anyone of you got experience with compression file systems like BTRFS or ZFS/OpenZFS and can tell us if this led to big performance losses? Thanks for responding -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3DD72EC1-E3EC-493D-94DD-33E63151A579%40patpro.net. For more options, visit https://groups.google.com/d/optout.
Re: LDAP authentication in Kibana
I configured ldap properties in httpd.conf. AuthLDAPBindDN uid=nabajaj,OU=Employee,OU=Cisco Users,DC=ds,DC=cisco,DC=com AuthLDAPBindPassword password AuthLDAPURL ldap://domain:389/OU=Employee,OU=Cisco Users,DC=ds,DC=cisco,DC=com?uid?sub?(objectClass=*) AuthType Basic AuthBasicProvider ldap authzldapauthoritative Off AuthName some text for login prompt require valid-user But it giving me error like [error] [client x.x.x.x] user nabajaj: authentication failure for /kibana: Password Mismatch Please help me here. On Wednesday, 18 June 2014 19:10:47 UTC+5:30, dharmendra pratap singh wrote: Hello Friends, Hope yo are doing good. In my application, I want to do the authentication in kibana using LDAP. if anyone has done it before, please help me to come out of this. Appreciate your help. Regards Dharmendra -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7be86d2f-9c67-4ccd-92d2-37d154ecc6d8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana settings for IPFIX/Netflow
Every minute, we take a 1/4096 sample of traffic using IPFIX. I want to graph this data as bits/sec in a histogram. However, my math kibana skills are failing me. Here is how I think it should be set up, but it's always too low a value for Gbit/s: Chart Value: total Value Field: bytes (bytes per minute field) Scale: 32768 (4096 * 8 bits in a byte) Seconds, checked Interval 1m Y Format bytes Help? Maybe I'm missing the obvious, but its 2 a.m. and I'm mystified. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73ae7aaffd5d44f290d16a14c679e2f8%40BN1PR07MB039.namprd07.prod.outlook.com. For more options, visit https://groups.google.com/d/optout.
Solaris 10 mlockall error code
We have been having issues running ES with the bootstrap.mlockall: true setting. We get the following error: [2014-07-21 09:56:44,436][WARN ][common.jna] Unknown mlockall error 11 I have googled around and looked in the solaris documentation for the description of the error codes and I have been unsuccessful. The solaris docs are here http://docs.oracle.com/cd/E26505_01/html/816-5168/mlockall-3c.html#REFMAN3Amlockall-3c but they only list 3 error codes. Is the error code 11 generated by ES? Our box has a total of 64 gigs of RAM and we give 32gigs to ES. We are running it using Oracle Java 1.7.13 64bit. Any help on the matter would be greatly appreciated! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Solaris 10 mlockall error code
What elasticsearch version are you on? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 21 July 2014 19:09, James Pace james.a.p...@gmail.com wrote: We have been having issues running ES with the bootstrap.mlockall: true setting. We get the following error: [2014-07-21 09:56:44,436][WARN ][common.jna] Unknown mlockall error 11 I have googled around and looked in the solaris documentation for the description of the error codes and I have been unsuccessful. The solaris docs are here http://docs.oracle.com/cd/E26505_01/html/816-5168/mlockall-3c.html#REFMAN3Amlockall-3c but they only list 3 error codes. Is the error code 11 generated by ES? Our box has a total of 64 gigs of RAM and we give 32gigs to ES. We are running it using Oracle Java 1.7.13 64bit. Any help on the matter would be greatly appreciated! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YbdrUjNe-S4UKvZcBQgnPup4QF3nidxLzLmDc82yd4mg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
RE: Kibana settings for IPFIX/Netflow
I'm tired, I didn't explain that well, we use pmacct to do 1 minute aggregations. From: elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com] On Behalf Of Janet Sullivan Sent: Monday, July 21, 2014 1:50 AM To: elasticsearch@googlegroups.com Subject: Kibana settings for IPFIX/Netflow Every minute, we take a 1/4096 sample of traffic using IPFIX. I want to graph this data as bits/sec in a histogram. However, my math kibana skills are failing me. Here is how I think it should be set up, but it's always too low a value for Gbit/s: Chart Value: total Value Field: bytes (bytes per minute field) Scale: 32768 (4096 * 8 bits in a byte) Seconds, checked Interval 1m Y Format bytes Help? Maybe I'm missing the obvious, but its 2 a.m. and I'm mystified. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f60cf590cb4146698ec2e4eacc8815b8%40BY2PR07MB043.namprd07.prod.outlook.com. For more options, visit https://groups.google.com/d/optout.
Re: Solaris 10 mlockall error code
Ooops good point! We are running version 1.2.1 On Monday, 21 July 2014 10:18:15 UTC+1, Mark Walkom wrote: What elasticsearch version are you on? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 21 July 2014 19:09, James Pace james@gmail.com javascript: wrote: We have been having issues running ES with the bootstrap.mlockall: true setting. We get the following error: [2014-07-21 09:56:44,436][WARN ][common.jna] Unknown mlockall error 11 I have googled around and looked in the solaris documentation for the description of the error codes and I have been unsuccessful. The solaris docs are here http://docs.oracle.com/cd/E26505_01/html/816-5168/mlockall-3c.html#REFMAN3Amlockall-3c but they only list 3 error codes. Is the error code 11 generated by ES? Our box has a total of 64 gigs of RAM and we give 32gigs to ES. We are running it using Oracle Java 1.7.13 64bit. Any help on the matter would be greatly appreciated! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9b4fa7b1-2561-4f33-8a60-969ff317206b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [ANN] ElasticUI AngularJS Directives - Easily Build an Interface on top of Elasticsearch
Thanks all for the enthusiastic responses! Very excited to see the first contributions today and many stars on github over the last couple of weeks. Would love to hear your feedback / use cases in case you have any already :) Regards, - Yousef http://www.elasticui.com http://www.tweetbeam.com On Thursday, July 3, 2014 3:06:11 PM UTC+2, Petar Djekic wrote: wow, this is really cool! On Wednesday, July 2, 2014 12:56:48 PM UTC+2, Yousef El-Dardiry wrote: Hi all, I just open sourced a set of AngularJS Directives for Elasticsearch. It enables developers to rapidly build a frontend (e.g.: faceted search engine) on top of Elasticsearch. http://www.elasticui.com (or github https://github.com/YousefED/ElasticUI) It makes creating an aggregation and listing the buckets as simple as: *ul eui-aggregation=ejs.TermsAggregation('text_agg').field('text').size(10)* *li ng-repeat=bucket in aggResult.buckets{{bucket}}/li* */ul* I think this was currently missing in the ecosystem, which is why I decided to build and open source it. I'd love any kind of feedback. - Yousef *-* Another example; add a checkbox facet based on a field using one of the built-in widgets https://github.com/YousefED/ElasticUI/blob/master/docs/widgets.md: *eui-checklist field='facet_field' size=10/eui-checklist* Resulting in [image: checklist screenshot] -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2b79b01-4460-4f07-ab38-508a22f50d37%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Creating index with _timestamps
Hi All, I am trying to create an index with a _timestamp mapping. curl -XPOST http://localhost:9200/test; -d' { settings : { number_of_shards : 5, number_of_replicas : 1}, mappings : { stats : { _timestamp : { enabled : true, store : true } }}}' When I am writing the data in the index I dont see the time stamp coming up curl -XPOST http://localhost:9200/test/stats/1; -d'{a:1}' {_index:test,_type:stats,_id:1,_version:1,created:true} Why am I not getting _timestamp, can anyone help? Thanks Surajit -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAxzCObuchG-a-tBfDQa-u3_ahX7hoaPq8g3jgVZXMBiEJH4oA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
marvel dashboard
Hi, The marvel overview by default shows 20 indices (third panel) I guess there is some way to configure this 20? To say 40? But how to do it? Your help appreciated. Regards. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ca8a37f-cce9-4d08-b2f1-240957a7f0d5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Solaris 10 mlockall error code
Error 11 is a POSIX error number (errno) and means EAGAIN, Resource temporarily unavailable, which is documented. On Solaris 10, this means, you must first allow the Elasticsearch user to allocate this amount of virtual memory. Switch to Elasticsearch user and then check the following values prctl -n project.max-shm-memory $$ prctl -n process.max-address-space $$ Then the sys admin could create a project with projadd and projmod and change resource limits for the Elasticsearch user. The error can also mean that Solaris has not enough memory for mlockall because there is already software running using the memory, or it the free memory is too fragmented. Jörg On Mon, Jul 21, 2014 at 11:09 AM, James Pace james.a.p...@gmail.com wrote: We have been having issues running ES with the bootstrap.mlockall: true setting. We get the following error: [2014-07-21 09:56:44,436][WARN ][common.jna] Unknown mlockall error 11 I have googled around and looked in the solaris documentation for the description of the error codes and I have been unsuccessful. The solaris docs are here http://docs.oracle.com/cd/E26505_01/html/816-5168/mlockall-3c.html#REFMAN3Amlockall-3c but they only list 3 error codes. Is the error code 11 generated by ES? Our box has a total of 64 gigs of RAM and we give 32gigs to ES. We are running it using Oracle Java 1.7.13 64bit. Any help on the matter would be greatly appreciated! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH%3DJQqs6UY7e2bO7z%3DhGacdMq_AJvoZpYCFfKqCAqMWaA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Solaris 10 mlockall error code
Thanks for the information, I'll chase it up and let you know how I get on. James On Monday, 21 July 2014 12:36:09 UTC+1, Jörg Prante wrote: Error 11 is a POSIX error number (errno) and means EAGAIN, Resource temporarily unavailable, which is documented. On Solaris 10, this means, you must first allow the Elasticsearch user to allocate this amount of virtual memory. Switch to Elasticsearch user and then check the following values prctl -n project.max-shm-memory $$ prctl -n process.max-address-space $$ Then the sys admin could create a project with projadd and projmod and change resource limits for the Elasticsearch user. The error can also mean that Solaris has not enough memory for mlockall because there is already software running using the memory, or it the free memory is too fragmented. Jörg On Mon, Jul 21, 2014 at 11:09 AM, James Pace james@gmail.com javascript: wrote: We have been having issues running ES with the bootstrap.mlockall: true setting. We get the following error: [2014-07-21 09:56:44,436][WARN ][common.jna] Unknown mlockall error 11 I have googled around and looked in the solaris documentation for the description of the error codes and I have been unsuccessful. The solaris docs are here http://docs.oracle.com/cd/E26505_01/html/816-5168/mlockall-3c.html#REFMAN3Amlockall-3c but they only list 3 error codes. Is the error code 11 generated by ES? Our box has a total of 64 gigs of RAM and we give 32gigs to ES. We are running it using Oracle Java 1.7.13 64bit. Any help on the matter would be greatly appreciated! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/bc9ba19f-c798-4e7f-9977-c9707709c47e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f503e0b0-2e6b-4b6d-9e0e-07c2858783df%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Creating index with _timestamps
Ask for it using fields: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html#search-request-fields Using timestamp does not modify the original source. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 21 juil. 2014 à 12:44, Surajit Roy beasurajit...@gmail.com a écrit : Hi All, I am trying to create an index with a _timestamp mapping. curl -XPOST http://localhost:9200/test; -d' { settings : { number_of_shards : 5, number_of_replicas : 1}, mappings : { stats : { _timestamp : { enabled : true, store : true } }}}' When I am writing the data in the index I dont see the time stamp coming up curl -XPOST http://localhost:9200/test/stats/1; -d'{a:1}' {_index:test,_type:stats,_id:1,_version:1,created:true} Why am I not getting _timestamp, can anyone help? Thanks Surajit -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAAxzCObuchG-a-tBfDQa-u3_ahX7hoaPq8g3jgVZXMBiEJH4oA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/873351B2-4C21-406A-B606-D9E8EE4A410D%40pilato.fr. For more options, visit https://groups.google.com/d/optout.
Index performace with large arrays
Hi there. I'm new to ES and would appreciate some advice on design concepts around large arrays. I am writing a help tip feature that pops up a message each time a user logs in. The user can flag a checkbox if they do not want to see this particular tip again. After playing with ElasticSearch the solution I came up with involved using a HelpTip document which contains an array of UserIds (identifying the users who have flagged that they do not want to see this tip again). Example1: HelpTip { title: Need help getting started?, text: Watch our overview video, userArray: [id1, id2] } I know ES can cope with large arrays but I wonder if there would be performance issues if this array grew to 4000+ IDs. This record would be regularly re-indexed (each time a new user ID is added to the array). would there be performance issues when indexing a document containing a large array field? Is this a sensible approach or would I be better using a relational model and holding the Help Tip info and the list of users in separate documents, then parsing them using two separate calls from my application? Example 2: HelpTip { title: Need help getting started?, text: Watch our overview video } HelpTipUserFlags { HelpTipId: 1, UserId: ID1 } Hope this makes sense. Thanks in advance for any help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9cba2c32-6266-4b87-b708-83ee64499dbf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Notifications for a query
Hello Everyone, Started working with Elasticsearch recently. Just wanted to know if there's any way of being notified when a document matches a query. (essentially create a monitoring system) Can I use percolator to do this ? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO9TxdPOJGbY8nEj4Hj31LMr%3DaVLTwc_%3DwJW%3Drc7Ly2LAYc8nw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Clustering/Sharding impact on query performance
My working assumption had been that elasticsearch executes queries across all shards in parallel and then merges the results. So maybe shards = cpu cores would help in this case where there is only one concurrent query. But I have never tested this assumption, out of curiosity during the 20 shard test did you still only see 1 cpu being used? Did you try 2 shards and get the same results? On Jul 20, 2014, at 1:01 AM, 'Fin Sekun' via elasticsearch elasticsearch@googlegroups.com wrote: Hi Kireet, thanks for your answer and sorry for the late response. More shards doesn't help. It will slow down the system because each shard takes quite some overhead to maintain a Lucene index and, the smaller the shards, the bigger the overhead. Having more shards enhances the indexing performance and allows to distribute a big index across machines, but I don't have a cluster with a lot of machines. I could observe this negative effects while testing with 20 shards. It would be very cool if somebody could answer/comment to the question summarized at the end of my post. Thanks again. On Friday, July 11, 2014 3:02:50 AM UTC+2, Kireet Reddy wrote: I would test using multiple primary shards on a single machine. Since your dataset seems to fit into RAM, this could help for these longer latency queries. On Thursday, July 10, 2014 12:24:26 AM UTC-7, Fin Sekun wrote: Any hints? On Monday, July 7, 2014 3:51:19 PM UTC+2, Fin Sekun wrote: Hi, SCENARIO Our Elasticsearch database has ~2.5 million entries. Each entry has the three analyzed fields match, sec_match and thi_match (all contains 3-20 words) that will be used in this query: https://gist.github.com/anonymous/a8d1142512e5625e4e91 ES runs on two types of servers: (1) Real servers (system has direct access to real CPUs, no virtualization) of newest generation - Very performant! (2) Cloud servers with virtualized CPUs - Poor CPUs, but this is generic for cloud services. See https://gist.github.com/anonymous/3098b142c2bab51feecc for (1) and (2) CPU details. ES settings: ES version 1.2.0 (jdk1.8.0_05) ES_HEAP_SIZE = 512m (we also tested with 1024m with same results) vm.max_map_count = 262144 ulimit -n 64000 ulimit -l unlimited index.number_of_shards: 1 index.number_of_replicas: 0 index.store.type: mmapfs threadpool.search.type: fixed threadpool.search.size: 75 threadpool.search.queue_size: 5000 Infrastructure: As you can see above, we don't use the cluster feature of ES (1 shard, 0 replicas). The reason is that our hosting infrastructure is based on different providers. Upside: We aren't dependent on a single hosting provider. Downside: Our servers aren't in the same LAN. This means: - We cannot use ES sharding, because synchronisation via WAN (internet) seems not a useful solution. - So, every ES-server has the complete dataset and we configured only one shard and no replicas for higher performance. - We have a distribution process that updates the ES data on every host frequently. This process is fine for us, because updates aren't very often and perfect just-in-time ES synchronisation isn't necessary for our business case. - If a server goes down/crashs, the central loadbalancer removes it (the resulting minimal packet lost is acceptable). PROBLEM For long query terms (6 and more keywords), we have very high CPU loads, even on the high performance server (1), and this leads to high response times: 1-4sec on server (1), 8-20sec on server (2). The system parameters while querying: - Very high load (usually 100%) for the thread responsible CPU (the other CPUs are idle in our test scenario) - No I/O load (the harddisks are fine) - No RAM bottlenecks So, we think the file caching is working fine, because we have no I/O problems and the garbage collector seams to be happy (jstat shows very few GCs). The CPU is the problem, and ES hot-threads point to the Scorer module: https://gist.github.com/anonymous/9cecfd512cb533114b7d SUMMARY/ASSUMPTIONS - Our database size isn't very big and the query not very complex. - ES is designed for huge amount of data, but the key is clustering/sharding: Data distribution to many servers means smaller indices, smaller indices leads to fewer CPU load and short response times. - So, our database isn't big, but to big for a single CPU and this means especially low performance (virtual) CPUs can only be used in sharding environments. If we don't want to lost the provider independency, we have only the following two options: 1) Simpler query (I think not possible in our case) 2) Smaller database QUESTIONS Are our assumptions correct? Especially: - Is clustering/sharding (also small indices) the main key to performance, that means the only possibility to prevent overloaded (virtual) CPUs? - Is it right that clustering is only useful/possible in LANs?
Re: Index performace with large arrays
If the user can opt out, I assume you have fewer opt outs than opt ins, then you should use opt outs for an andnot filter :) In that case, I would create an opt out index in the form index/type/id users/optouts/userid with docs containing a quite short array of opt outs { optouts: [ id1, id2, ... idn ] } so you can get the doc, read the opt out array, and add it as an and not filter to your help tip query. You could also add this optouts array to the user index, but this depends on your overall design. If you want to remove the opt outs, you could simply drop the optputs mapping type. Regarding the array length, you can add as much values as you like, ES can handle that. If the docs get long (I mean thousands of entries), they will take substantial time just for fetching them, so I think you should prefer a model with data as short as possible. Jörg On Mon, Jul 21, 2014 at 4:58 PM, Steve Mee st...@genialgenetics.com wrote: Hi there. I'm new to ES and would appreciate some advice on design concepts around large arrays. I am writing a help tip feature that pops up a message each time a user logs in. The user can flag a checkbox if they do not want to see this particular tip again. After playing with ElasticSearch the solution I came up with involved using a HelpTip document which contains an array of UserIds (identifying the users who have flagged that they do not want to see this tip again). Example1: HelpTip { title: Need help getting started?, text: Watch our overview video, userArray: [id1, id2] } I know ES can cope with large arrays but I wonder if there would be performance issues if this array grew to 4000+ IDs. This record would be regularly re-indexed (each time a new user ID is added to the array). would there be performance issues when indexing a document containing a large array field? Is this a sensible approach or would I be better using a relational model and holding the Help Tip info and the list of users in separate documents, then parsing them using two separate calls from my application? Example 2: HelpTip { title: Need help getting started?, text: Watch our overview video } HelpTipUserFlags { HelpTipId: 1, UserId: ID1 } Hope this makes sense. Thanks in advance for any help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9cba2c32-6266-4b87-b708-83ee64499dbf%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/9cba2c32-6266-4b87-b708-83ee64499dbf%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFw7P8FQu9%2Bc2ajj0Vg2wNvbpz%3D%2Bo9Af8R-5p9Cj8-7FQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Index performace with large arrays
Thanks for the response Jörg. That tells me exactly what I need to know... stay away from very large arrays here in my design :-) Cheers - Steve -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f06b544f-8aec-4c44-aa38-ce53e5f0be74%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: puppet-elasticsearch options
Hi Richard, another question: you are creating the elasticsearch user and group somewhere in the module (havent found exactly where yet). My problem is that I have to create a directory for data_dir (on a different device) that is needed by the class (or instance, not sure), but I need the owner and the group to be able to set it otherwise the service won't start. Can I set a requirement in my file declaration to make sure that the user and the group already exist? Something like file { /data/elasticsearch: ensure = directory, owner = elasticsearch, group = elasticsearch, require = ??? } Once again, thanks! Andrej Am Dienstag, 1. Juli 2014 14:37:55 UTC+2 schrieb Richard Pijnenburg: Hi Andrej, Sorry for the late response. Didn't get an update email about it. As long as you don't setup an instance with the 'elasticsearch::instance' define it will only install the package but do nothing afterwards. I recently fixed that the default files from the packages are being removed now. The memory can be set via the init_defaults hash by setting the ES_HEAP option. The issue with 0.90.x versions is that it automatically starts up after package installation. Since i don't stop it, it keeps running. Its advised to run a newer version of ES since 0.90.x will be EOL'd at some point. On Thursday, June 26, 2014 2:24:47 PM UTC+1, Andrej Rosenheinrich wrote: Hi Richard, thanks for your answer, it for sure helped! Still, I am puzzling with a few effects and questions: 1.) I am a bit confused by your class/instance idea. I can do something pretty simple like class { 'elasticsearch' : version = '0.90.7' } and it will install elasticsearch in the correct version using the default settings you defined. Repeating this (I tested every step on a fresh debian instance in a VM, no different puppet installation steps in between) with a config added in class like class { 'elasticsearch' : version = '0.90.7', config = { 'cluster'= { 'name' = 'andrejtest' }, 'http.port' = '9210' } } I still get elasticsearch installed, but it completely ignores everything in the config. (I should be able to curl localhost:9210, but its up and running on the old default port, using the old cluster name). You explained overwriting for instances and classes a bit, so I tried the following thing (again, blank image, no previous installation) : class { 'elasticsearch' : version = '0.90.7', config = { 'cluster'= { 'name' = 'andrejtest' }, 'http.port' = '9210' } } elasticsearch::instance { 'es-01': } What happened is that I have two elasticsearch instances running, one with the default value and another one (es-01) that uses the provided configuration. Even freakier, I install java7 in my script before the snippet posted , the first (default based) elasticsearch version uses the standard openjdk-6 java, the second instance (es-01) uses java7. So, where is my mistake or what am I doing wrong? What would be the way to install and start only one service using provided configuration? And does elasticsearch::instance require an instance name? I would really miss the funny comic node names ;) 2. As you pointed out I can define all values from elasticsearch.yml in the config hash. But what about memory settings (I usually modify the init.d script for that), can I configure Xms and Xmx settings in the puppet module somehow? Logging configuration would be a nice-to-have (no must-have), just in case you were wondering ;) I hope my questions don't sound too confusing, if you could give me a hint on what I am doing wrong I would really appreciate it. Thanks in advance! Andrej Am Freitag, 20. Juni 2014 09:44:49 UTC+2 schrieb Richard Pijnenburg: Hi Andrej, Thank you for using the puppet module :-) The 'port' and 'discovery minimum' settings are both configuration settings for the elasticsearch.yml file. You can set those in the 'config' option variable, for example: elasticsearch::instance { 'instancename': config = { 'http.port' = '9210', 'discovery.zen.minimum_master_nodes' = 3 } } For the logging part, management of the logging.yml file is very limited at the moment but i hope to get some feedback on extending that. The thresholds for the slowlogs can be set in the same config option variable. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-slowlog.html#index-slow-log for more information. If you have any further questions, let me know. Cheers On Thursday, June 19, 2014 9:53:10 AM UTC+1, Andrej Rosenheinrich wrote: Hi, i am playing around with puppet-easticsearch 0.4.0, works wells so far (thanks!), but I am missing a few options I havent seen in the documentation. As I couldnt figure it out immediately by
Re: Kibana settings for IPFIX/Netflow
Hi Janets, currently I am also trying pmacct It's processing result. I am storing data to elasticsearch, But currently struggling with dashboard creation, can you share your kibana dashboard file. it's very useful to me. -Dhanasekaran. Did I learn something today? If not, I wasted it. On Mon, Jul 21, 2014 at 5:19 AM, Janet Sullivan jan...@nairial.net wrote: I’m tired, I didn’t explain that well, we use pmacct to do 1 minute aggregations. *From:* elasticsearch@googlegroups.com [mailto: elasticsearch@googlegroups.com] *On Behalf Of *Janet Sullivan *Sent:* Monday, July 21, 2014 1:50 AM *To:* elasticsearch@googlegroups.com *Subject:* Kibana settings for IPFIX/Netflow Every minute, we take a 1/4096 sample of traffic using IPFIX. I want to graph this data as bits/sec in a histogram. However, my math kibana skills are failing me. Here is how I think it should be set up, but it’s always too low a value for Gbit/s: Chart Value: total Value Field: bytes (bytes per minute field) Scale: 32768 (4096 * 8 bits in a byte) Seconds, checked Interval 1m Y Format bytes Help? Maybe I’m missing the obvious, but its 2 a.m. and I’m mystified. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f60cf590cb4146698ec2e4eacc8815b8%40BY2PR07MB043.namprd07.prod.outlook.com https://groups.google.com/d/msgid/elasticsearch/f60cf590cb4146698ec2e4eacc8815b8%40BY2PR07MB043.namprd07.prod.outlook.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJzooYdPZOny6EqgCWD-QWGBXVhSbXj0HKWxc-arqAu9kbE_7A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Sort Order when relevance is equal
Seconding question. Also, is it possible that it changed between versions 1.0 and 1.2? We're trying to upgrade and noticed a seemingly random order of documents with equal relevance in regression testing. On Wednesday, 14 May 2014 00:49:55 UTC, Erich Lin wrote: Ignoring the bouncing results problem with multiple shards , is the order of results deterministic when sorting by relevance score or any other field. What I mean by this is if two documents have the same score, 1) will they always be in the same order if we set the preference parameter to an arbitrary string like the user’s session ID. 2) If so, is there a way to predict this deterministic order? Is it done by ID of the documents as a tiebreaker etc? 3) If not, could we specify that or do we have to do a secondary sort on ID if we wanted to do that? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/284b6bfd-68aa-4717-ab52-73d44f7cc196%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help with Java API
To issue 1: you create a single node cluster without index, and a client of it. To issue 2: you see the UnavailableShardsException caused by a timeout while indexing to a replica shard. This means, you may have set up a single node cluster, but with replica level 1 (default) which needs 2 nodes for indexing. Maybe there was once another node joining the cluster and ES wants it back abut it never came (after 60 secs). Then ES returns the timeout error. Maybe replica level 0 helps. You should also check the cluster health. A color of green shows everything works, yellow means there are too few nodes to satisfy the replica condition, and read means the cluster is not in a consistent/usable state. To issue 3: not sure what clusterName() means. I would use settings and add a field cluster.name. Maybe it is equivalent. You must ensure you use the same cluster.name setting throughout all nodes and clients. You also can not reuse data from clusters that had other names (look into the data folder) To issue 4: ES takes ~5 secs for discovery, the zen modules pings and waits for responding master nodes by default. If you just test locally on your developer machine, you should disable zen. Most easily by disabling networking at all, by using NodeBuilder.nodeBuilder().local(true)... Jörg On Mon, Jul 21, 2014 at 6:53 PM, Alain Désilets alaindesile...@gmail.com wrote: I am trying to get started with the Java API, using the excellent tutorial found here: http://www.slideshare.net/dadoonet/hands-on-lab-elasticsearch But I am still having a lot of trouble. Below is a sample of code that I have written: package ca.nrc.ElasticSearch; import org.codehaus.jackson.map.ObjectMapper; import org.elasticsearch.action.get.GetResponse; import org.elasticsearch.action.index.IndexResponse; import org.elasticsearch.client.Client; import org.elasticsearch.node.NodeBuilder; public class ElasticSearchRunner { static ObjectMapper mapper; static Client client; static String indexName = meal5; static String typeName = beer; static long startTimeMSecs; public static void main(String[] args) throws Exception { startTimeMSecs = System.currentTimeMillis(); mapper = new ObjectMapper(); // create once, reuse echo(Creating the ElasticSearch client...); client = NodeBuilder.nodeBuilder().node().client(); // Does this create a brand new cluster? // client = NodeBuilder.nodeBuilder().clusterName(handson).client(true).node().client(); // Joins existing cluster called handson echo(DONE creating the ElasticSearch client... Elapsed time = +elapsedSecs()+ secs.); echo(Creating a beer object...); Beer beer = new Beer(Heineken, Colour.PALE, 0.33, 3); String jsonString = mapper.writeValueAsString(beer); echo(DONE Creating a beer object...); echo(Indexing the beer object...); IndexResponse ir = null; ir = client.prepareIndex(indexName, typeName).setSource(jsonString) .execute().actionGet(); echo(DONE Indexing the beer object...); echo(Retrieving the beer object...); GetResponse gr = null; gr = client.prepareGet(indexName, typeName, ir.getId()).execute() .actionGet(); echo(DONE Retrieving the beer object...); } public static float elapsedSecs() { float elapsed = (System.currentTimeMillis() - startTimeMSecs)/1000; return elapsed; } public static void echo(String mess) { mess = mess + (Elapsed so far: +elapsedSecs()+ seconds); System.out.println(mess); } } It works, sort of... If I use the first method for creating the client: client = NodeBuilder.nodeBuilder().node().client(); Then it works fin the first time I run it. However: *** ISSUE 1: If I try to inspect the meal index with Marvel, I don't find it. Also, *** ISSUE 2: If I run the application a second time, I get the following output: Creating the ElasticSearch client... (Elapsed so far: 0.0 seconds) log4j:WARN No appenders could be found for logger (org.elasticsearch.node). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. DONE creating the ElasticSearch client... Elapsed time = 9.0 secs. (Elapsed so far: 9.0 seconds) Creating a beer object... (Elapsed so far: 9.0 seconds) DONE Creating a beer object... (Elapsed so far: 9.0 seconds) Indexing the beer object... (Elapsed so far: 9.0 seconds) Exception in thread main org.elasticsearch.action.UnavailableShardsException: [meal5][0] [2] shardIt, [0] active : Timeout waiting for [1m], request: index {[meal5][beer][B3F5ZEmSTruqdnlxhYviFg], source[{brand:Heineken,colour:PALE,size:0.33,price:3.0}]} at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseTimeoutFailure(TransportShardReplicationOperationAction.java:526) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$3.onTimeout(TransportShardReplicationOperationAction.java:516) at
[ANN] Elasticsearch CouchDB River plugin 2.2.0 released
Heya, We are pleased to announce the release of the Elasticsearch CouchDB River plugin, version 2.2.0. The CouchDB River plugin allows to hook into couchdb _changes feed and automatically index it into elasticsearch.. https://github.com/elasticsearch/elasticsearch-river-couchdb/ Release Notes - elasticsearch-river-couchdb - Version 2.2.0 Fix: * [67] - Race condition: NPE when starting with no database (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/67) * [66] - Race condition: exception while closing the river (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/66) Update: * [64] - Default script engine should be mvel (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/64) * [62] - Use `script_type` instead of `scriptType` (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/62) * [60] - Default couchdb db name should be river name (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/60) * [56] - Update to elasticsearch 1.2.0 (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/56) New: * [47] - Move tests to elasticsearch test framework (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/47) * [17] - [TEST] Check that you can create a couchdb DB after the river creation (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/17) Doc: * [55] - Clarify deleting documents in multi-type example (https://github.com/elasticsearch/elasticsearch-river-couchdb/pull/55) * [51] - Add documentation and test about parent/child (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/51) * [45] - [TEST] Add test and documentation for removing fields using scripts (https://github.com/elasticsearch/elasticsearch-river-couchdb/issues/45) Issues, Pull requests, Feature requests are warmly welcome on elasticsearch-river-couchdb project repository: https://github.com/elasticsearch/elasticsearch-river-couchdb/ For questions or comments around this plugin, feel free to use elasticsearch mailing list: https://groups.google.com/forum/#!forum/elasticsearch Enjoy, -The Elasticsearch team -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53cd521a.d327b40a.404d.3899SMTPIN_ADDED_MISSING%40gmr-mx.google.com. For more options, visit https://groups.google.com/d/optout.
Synonym filter results in term facet
Hi All, I have a requirement in which I need to find distinct company names. I was using Keyword tokenizer for that field and through term facet I was able to get distinct company names. However terms facet treated company names like ibm suisse, ibm corporation, ibm as different companies. Online documentation suggested me to use Synonym filter to solve this. My settings is: curl -XPUT 'http://localhost:9200/dataindex/' -d '{ settings: { index: { analysis: { analyzer: { customAnalyzer: { type: custom, tokenizer: whitespace, filter: [ lowercase,synonym ] } }, filter: { synonym : { type : synonym, tokenizer: keyword, synonyms_path : analysis/synonym.txt } } } } } }' My mapping is: curl -XPUT 'http://localhost:9200/dataindex/tweet/_mapping' -d ' { tweet : { properties : { company: { type: string, analyzer: customAnalyzer } } } }' In the synonym.txt file I have : ibm suisse, ibm corporation, ibm business, ibm = ibm corp ltd Indexed data: curl -XPUT 'http://localhost:9200/dataindex/tweet/1' -d '{ company : ibm }' curl -XPUT 'http://localhost:9200/dataindex/tweet/2' -d '{ company : ibm corporation }' curl -XPUT 'http://localhost:9200/dataindex/tweet/3' -d '{ company : ibm suisse }' curl -XPUT 'http://localhost:9200/dataindex/tweet/4' -d '{ company : ibm business }' If I run a terms facet: { facets: { loc_facet: { terms: { field: company } } } } I get 3 terms ie {term: ibm corp ltd, count: 3} {term: suisse, count: 1} {term: corporation, count: 1} I want the facet result to return only one term: ibm corp ltd with count=3. This way i will get distinct company names and also map synonym names into single company name. Please correct me if I am using wrong tokenizer or my approach is not correct. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1ba32926-7015-4b8a-89ae-bf43a2561b71%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES Failures and Recovery
Did you do a failure analysis and what were your findings? Thanks, Venu On Tuesday, June 11, 2013 7:41:57 AM UTC-7, Anand Nalya wrote: Hi, We are using 0.90.1 version ES and are planning for high availability testing. While the entire scheme to enable the cluster to be highly available is clear, I wanted to get some idea about ES Service lifetime in terms of Mean-Time to Failure and Time of Recovery in cases of failure. Any historic evidences will also help, as it will be vital for us to calculate the actual availability of the system across an year. While I understand that ES provides seamless high availability through replication, but any failure, will impact the performance to some extent and this calculation will help in deriving the actual number of nodes that we should consider without compromising on the performance as well, while the system is available. Any ideas/facts would be very helpful . Thanks, Anand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f3c823f-0b54-4213-ac45-3dfa2f0b9af3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Can one do a singular/plural phrase match with the Query String Query?
Can one perform the following query using wildcards ( instead of two distinct phrases ) when using a Query String Query? photographic film OR photographic films These do not seem to work, and return the same number of results as just photographic film: photographic film? photographic film* Can wildcards not be placed inside Exact Phrase queries? Is there a way to mimic this? My goal is to be able to perform queries like this: photo* film? ... capturing: photo film photo films photographic films photography films etc... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66cc151f-a235-40d4-a125-2236aae0f9bf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [ERROR][bootstrap] {1.2.2}: Initialization Failed ... - NullPointerException[null]
Same issue here, I upgraded to ES 1.2.2, and jdk 1.7.0_60-b19, debug log output - [2014-07-21 18:50:05,054][INFO ][node ] [p-esmon01] version[1.2.2], pid[23697], build[9902f08/2014-07-09T12:02:32Z] [2014-07-21 18:50:05,054][INFO ][node ] [p-esmon01] initializing ... [2014-07-21 18:50:05,056][DEBUG][node ] [p-esmon01] using home [/usr/share/elasticsearch], config [/etc/elasticsearch], data [[/etc/elasticsearch/data]], logs [/var/log/elasticsearch], work [/etc/elasticsearch/work], plugins [/usr/share/elasticsearch/plugins] [2014-07-21 18:50:05,064][DEBUG][plugins ] [p-esmon01] lucene property is not set in plugin es-plugin.properties file. Skipping test. [2014-07-21 18:50:05,074][DEBUG][plugins ] [p-esmon01] skipping [jar:file:/usr/share/elasticsearch/plugins/marvel/marvel-1.2.1.jar!/es-plugin.properties] [2014-07-21 18:50:05,075][DEBUG][plugins ] [p-esmon01] lucene property is not set in plugin es-plugin.properties file. Skipping test. [2014-07-21 18:50:05,075][DEBUG][plugins ] [p-esmon01] [/usr/share/elasticsearch/plugins/http-basic-server-plugin/_site] directory does not exist. [2014-07-21 18:50:05,077][DEBUG][plugins ] [p-esmon01] [/usr/share/elasticsearch/plugins/http-basic/_site] directory does not exist. [2014-07-21 18:50:05,078][INFO ][plugins ] [p-esmon01] loaded [marvel, http-basic-server-plugin], sites [marvel, head] [2014-07-21 18:50:05,104][DEBUG][common.compress.lzf ] using [UnsafeChunkDecoder] decoder [2014-07-21 18:50:05,113][DEBUG][env ] [p-esmon01] using node location [[/etc/elasticsearch/data/p_es_clust_mon01/nodes/0]], local_node_id [0] [2014-07-21 18:50:05,907][ERROR][bootstrap] {1.2.2}: Initialization Failed ... - ExecutionError[java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse] NoClassDefFoundError[org/elasticsearch/rest/StringRestResponse] ClassNotFoundException[org.elasticsearch.rest.StringRestResponse] org.elasticsearch.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2199) at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3934) at org.elasticsearch.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) at org.elasticsearch.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) at org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:51) at org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:50) at org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:50) at org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:372) at org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:148) at org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:204) at org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:119) at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:102) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70) at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59) at org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:188) at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159) at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70) at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:203) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32) Caused by: java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532) at java.lang.Class.getDeclaredConstructors(Class.java:1901) at org.elasticsearch.common.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:177) at org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:59) at org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:29) at org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:37) at org.elasticsearch.common.inject.ConstructorInjectorStore$1.create(ConstructorInjectorStore.java:33) at
Setting from Json File
Hi all, I am trying to read the setting from Json file. Something like: ImmutableSettings.settingsBuilder().loadFromSource( path ); But it seems loadFromSource doesn't work for this purpose. Could you please point me out the method? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/44cf6310-4e35-4c30-86ce-69a3660f1257%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Synonym filter results in term facet
Hello Ravi , Your approach is wrong. When you use synonym filter , it indexes all synonyms of that token hence and synonym will match against that term. So when you do a facet , you will get an aggregation of all synonyms rather than just one. Better approach would be to store the unique name into some other field and take a facet of that field. Thanks Vineeth On Mon, Jul 21, 2014 at 11:21 PM, ravi...@gmail.com wrote: Hi All, I have a requirement in which I need to find distinct company names. I was using Keyword tokenizer for that field and through term facet I was able to get distinct company names. However terms facet treated company names like ibm suisse, ibm corporation, ibm as different companies. Online documentation suggested me to use Synonym filter to solve this. My settings is: curl -XPUT 'http://localhost:9200/dataindex/' -d '{ settings: { index: { analysis: { analyzer: { customAnalyzer: { type: custom, tokenizer: whitespace, filter: [ lowercase,synonym ] } }, filter: { synonym : { type : synonym, tokenizer: keyword, synonyms_path : analysis/synonym.txt } } } } } }' My mapping is: curl -XPUT 'http://localhost:9200/dataindex/tweet/_mapping' -d ' { tweet : { properties : { company: { type: string, analyzer: customAnalyzer } } } }' In the synonym.txt file I have : ibm suisse, ibm corporation, ibm business, ibm = ibm corp ltd Indexed data: curl -XPUT 'http://localhost:9200/dataindex/tweet/1' -d '{ company : ibm }' curl -XPUT 'http://localhost:9200/dataindex/tweet/2' -d '{ company : ibm corporation }' curl -XPUT 'http://localhost:9200/dataindex/tweet/3' -d '{ company : ibm suisse }' curl -XPUT 'http://localhost:9200/dataindex/tweet/4' -d '{ company : ibm business }' If I run a terms facet: { facets: { loc_facet: { terms: { field: company } } } } I get 3 terms ie {term: ibm corp ltd, count: 3} {term: suisse, count: 1} {term: corporation, count: 1} I want the facet result to return only one term: ibm corp ltd with count=3. This way i will get distinct company names and also map synonym names into single company name. Please correct me if I am using wrong tokenizer or my approach is not correct. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1ba32926-7015-4b8a-89ae-bf43a2561b71%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/1ba32926-7015-4b8a-89ae-bf43a2561b71%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5ny%3Di76CHwpbEoY-4nGaraQfz-Tmmm5MVJbiA%2B0nrgKZQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Can one do a singular/plural phrase match with the Query String Query?
I think a stemmer analyzer would be fit your use case: See http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/choosing-a-stemmer.html#choosing-a-stemmer -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr Le 21 juillet 2014 à 20:53:09, Brian Jones (tbrianjo...@gmail.com) a écrit: Can one perform the following query using wildcards ( instead of two distinct phrases ) when using a Query String Query? photographic film OR photographic films These do not seem to work, and return the same number of results as just photographic film: photographic film? photographic film* Can wildcards not be placed inside Exact Phrase queries? Is there a way to mimic this? My goal is to be able to perform queries like this: photo* film? ... capturing: photo film photo films photographic films photography films etc... -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66cc151f-a235-40d4-a125-2236aae0f9bf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53cd6b41.19495cff.e03c%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout.
Re: [ANN] Log4j2 Elasticsearch appender
The Log4j appender is batching events because it is using bulk indexing. If you are worried, you can set the bulk action size to an extreme high value and increase heap to some GB only for buffering log messages, together with a reasonable flush interval. Jörg On Mon, Jul 21, 2014 at 9:14 PM, Ivan Brusic i...@brusic.com wrote: I was indexing events into Elasticsearch via the standard SocketAppender into Logstash, but I stopped doing so since the SocketAppender was not releasing threads. Great to see a direct approach, but I like to use Logstash in the middle as a buffer in order to batch events. -- Ivan On Sat, Jul 19, 2014 at 8:59 AM, Alfredo Serafini ser...@gmail.com wrote: I'll try it as soon as I can! thanks, Alfredo :-) Il giorno venerdì 18 luglio 2014 10:08:14 UTC+2, Jörg Prante ha scritto: Hi, I released a Log4j2 Elasticsearch appender https://github.com/jprante/log4j2-elasticsearch in the hope it is useful. Best, Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dce481d9-ac3e-4fd0-aaba-3a4c69d07d34%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/dce481d9-ac3e-4fd0-aaba-3a4c69d07d34%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCppMVB-9_0btCXtPTo67c%2BMFziYDff0qwiptj4kn-hyw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCppMVB-9_0btCXtPTo67c%2BMFziYDff0qwiptj4kn-hyw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGim0o0LVVFRoLAM2DOHgS-4wM%2BariaLrF3aBg6Vjs2Nw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: [ERROR][bootstrap] {1.2.2}: Initialization Failed ... - NullPointerException[null]
You have to update your plugins marvel and http-basic-server-plugin to get them to work with ES 1.2.2 Jörg On Mon, Jul 21, 2014 at 9:02 PM, Phillip Ulberg phillip.ulb...@gmail.com wrote: Same issue here, I upgraded to ES 1.2.2, and jdk 1.7.0_60-b19, debug log output - [2014-07-21 18:50:05,054][INFO ][node ] [p-esmon01] version[1.2.2], pid[23697], build[9902f08/2014-07-09T12:02:32Z] [2014-07-21 18:50:05,054][INFO ][node ] [p-esmon01] initializing ... [2014-07-21 18:50:05,056][DEBUG][node ] [p-esmon01] using home [/usr/share/elasticsearch], config [/etc/elasticsearch], data [[/etc/elasticsearch/data]], logs [/var/log/elasticsearch], work [/etc/elasticsearch/work], plugins [/usr/share/elasticsearch/plugins] [2014-07-21 18:50:05,064][DEBUG][plugins ] [p-esmon01] lucene property is not set in plugin es-plugin.properties file. Skipping test. [2014-07-21 18:50:05,074][DEBUG][plugins ] [p-esmon01] skipping [jar:file:/usr/share/elasticsearch/plugins/marvel/marvel-1.2.1.jar!/es-plugin.properties] [2014-07-21 18:50:05,075][DEBUG][plugins ] [p-esmon01] lucene property is not set in plugin es-plugin.properties file. Skipping test. [2014-07-21 18:50:05,075][DEBUG][plugins ] [p-esmon01] [/usr/share/elasticsearch/plugins/http-basic-server-plugin/_site] directory does not exist. [2014-07-21 18:50:05,077][DEBUG][plugins ] [p-esmon01] [/usr/share/elasticsearch/plugins/http-basic/_site] directory does not exist. [2014-07-21 18:50:05,078][INFO ][plugins ] [p-esmon01] loaded [marvel, http-basic-server-plugin], sites [marvel, head] [2014-07-21 18:50:05,104][DEBUG][common.compress.lzf ] using [UnsafeChunkDecoder] decoder [2014-07-21 18:50:05,113][DEBUG][env ] [p-esmon01] using node location [[/etc/elasticsearch/data/p_es_clust_mon01/nodes/0]], local_node_id [0] [2014-07-21 18:50:05,907][ERROR][bootstrap] {1.2.2}: Initialization Failed ... - ExecutionError[java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse] NoClassDefFoundError[org/elasticsearch/rest/StringRestResponse] ClassNotFoundException[org.elasticsearch.rest.StringRestResponse] org.elasticsearch.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2199) at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3934) at org.elasticsearch.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) at org.elasticsearch.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) at org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:51) at org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:50) at org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:50) at org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:372) at org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:148) at org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:204) at org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:119) at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:102) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70) at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59) at org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:188) at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159) at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70) at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:203) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32) Caused by: java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532) at java.lang.Class.getDeclaredConstructors(Class.java:1901) at org.elasticsearch.common.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:177) at org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:59) at org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:29) at
Re: [ERROR][bootstrap] {1.2.2}: Initialization Failed ... - NullPointerException[null]
Turned out to be an issue with the http-basic auth plugin I was using. On Monday, July 21, 2014 2:02:02 PM UTC-5, Phillip Ulberg wrote: Same issue here, I upgraded to ES 1.2.2, and jdk 1.7.0_60-b19, debug log output - [2014-07-21 18:50:05,054][INFO ][node ] [p-esmon01] version[1.2.2], pid[23697], build[9902f08/2014-07-09T12:02:32Z] [2014-07-21 18:50:05,054][INFO ][node ] [p-esmon01] initializing ... [2014-07-21 18:50:05,056][DEBUG][node ] [p-esmon01] using home [/usr/share/elasticsearch], config [/etc/elasticsearch], data [[/etc/elasticsearch/data]], logs [/var/log/elasticsearch], work [/etc/elasticsearch/work], plugins [/usr/share/elasticsearch/plugins] [2014-07-21 18:50:05,064][DEBUG][plugins ] [p-esmon01] lucene property is not set in plugin es-plugin.properties file. Skipping test. [2014-07-21 18:50:05,074][DEBUG][plugins ] [p-esmon01] skipping [jar:file:/usr/share/elasticsearch/plugins/marvel/marvel-1.2.1.jar!/es-plugin.properties] [2014-07-21 18:50:05,075][DEBUG][plugins ] [p-esmon01] lucene property is not set in plugin es-plugin.properties file. Skipping test. [2014-07-21 18:50:05,075][DEBUG][plugins ] [p-esmon01] [/usr/share/elasticsearch/plugins/http-basic-server-plugin/_site] directory does not exist. [2014-07-21 18:50:05,077][DEBUG][plugins ] [p-esmon01] [/usr/share/elasticsearch/plugins/http-basic/_site] directory does not exist. [2014-07-21 18:50:05,078][INFO ][plugins ] [p-esmon01] loaded [marvel, http-basic-server-plugin], sites [marvel, head] [2014-07-21 18:50:05,104][DEBUG][common.compress.lzf ] using [UnsafeChunkDecoder] decoder [2014-07-21 18:50:05,113][DEBUG][env ] [p-esmon01] using node location [[/etc/elasticsearch/data/p_es_clust_mon01/nodes/0]], local_node_id [0] [2014-07-21 18:50:05,907][ERROR][bootstrap] {1.2.2}: Initialization Failed ... - ExecutionError[java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse] NoClassDefFoundError[org/elasticsearch/rest/StringRestResponse] ClassNotFoundException[org.elasticsearch.rest.StringRestResponse] org.elasticsearch.common.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2199) at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3934) at org.elasticsearch.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) at org.elasticsearch.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) at org.elasticsearch.common.inject.internal.FailableCache.get(FailableCache.java:51) at org.elasticsearch.common.inject.ConstructorInjectorStore.get(ConstructorInjectorStore.java:50) at org.elasticsearch.common.inject.ConstructorBindingImpl.initialize(ConstructorBindingImpl.java:50) at org.elasticsearch.common.inject.InjectorImpl.initializeBinding(InjectorImpl.java:372) at org.elasticsearch.common.inject.BindingProcessor$1$1.run(BindingProcessor.java:148) at org.elasticsearch.common.inject.BindingProcessor.initializeBindings(BindingProcessor.java:204) at org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:119) at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:102) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:93) at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70) at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:59) at org.elasticsearch.node.internal.InternalNode.init(InternalNode.java:188) at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:159) at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:70) at org.elasticsearch.bootstrap.Bootstrap.main(Bootstrap.java:203) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:32) Caused by: java.lang.NoClassDefFoundError: org/elasticsearch/rest/StringRestResponse at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532) at java.lang.Class.getDeclaredConstructors(Class.java:1901) at org.elasticsearch.common.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:177) at org.elasticsearch.common.inject.ConstructorInjectorStore.createConstructor(ConstructorInjectorStore.java:59) at org.elasticsearch.common.inject.ConstructorInjectorStore.access$000(ConstructorInjectorStore.java:29) at
Getting _id field in elasticsearch to map to a field in HIVE
Hi, I am working on a project to integrate hive and elasticsearch and for one of our use case we are loading data from ELASTISCSEARCH -- HIVE. During this process I want to store the _id field which is in elasticsearch document in hive. I am able to get the fields which are part of _source like messages, @timestamp etc but I am not able to get the _id associated with that particular document. The following is the sample table I am trying to create create external table eshivetable (id string,eventdate timestamp, host string, username string, message string) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'logstash-*/syslog', 'es.mapping.names' = 'id:_id,eventdate:@timestamp,host:host,username:username,message:message','es.nodes'='10.10.10.50','es.port'='9200','es.query'='?q=type:syslog'); So when I select id it returns a null value... Can some one help me with this please. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d2eb38d-9f0d-4329-ba2b-0d28c06f98e5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Elasticsearch recovering process took a long time.
Hi, We are using elasticsearch 0.90.1. We had some problems with the network, when the network is down (the convergence time is around 40s). Recovering the elastic after this event took a long time to be available. We have 16 data nodes and 16 shards with 2 replica and the settings: discovery.zen.minimum_master_nodes: 4 gateway.recover_after_nodes: 4 gateway.expected_nodes: 6 Any idea about how to minimize the recovery time? Is it a good idea to update the version? What will happen if we increase discovery.zen.fd.ping_interval and ping_timeout settings? Assuming the network is completely off, is there any way to wait for at least those 40 seconds to start marking servers down? Does the ping_timeout ignore the failure to connect to a node? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a32a0c42-122a-45a3-9c56-8869b4028608%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch recovering process took a long time.
Sorry, the setting discovery.zen.minimum_master_nodes is 8 El lunes, 21 de julio de 2014 17:20:03 UTC-3, anivlis escribió: Hi, We are using elasticsearch 0.90.1. We had some problems with the network, when the network is down (the convergence time is around 40s). Recovering the elastic after this event took a long time to be available. We have 16 data nodes and 16 shards with 2 replica and the settings: discovery.zen.minimum_master_nodes: 4 gateway.recover_after_nodes: 4 gateway.expected_nodes: 6 Any idea about how to minimize the recovery time? Is it a good idea to update the version? What will happen if we increase discovery.zen.fd.ping_interval and ping_timeout settings? Assuming the network is completely off, is there any way to wait for at least those 40 seconds to start marking servers down? Does the ping_timeout ignore the failure to connect to a node? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7997de22-b421-4f7c-ae7f-e001dbfef618%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elasticsearch recovering process took a long time.
Recovery is throttled since version 0.90.1 http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/index-modules-store.html#store-throttling Increase indices.store.throttle.max_bytes_per_sec to a level that is suitable for your environment. Since IO should be the main bottleneck, the setting could vary greatly depending on SSD, platter disk or shared storage. -- Ivan On Mon, Jul 21, 2014 at 1:28 PM, anivlis svluc...@gmail.com wrote: Sorry, the setting discovery.zen.minimum_master_nodes is 8 El lunes, 21 de julio de 2014 17:20:03 UTC-3, anivlis escribió: Hi, We are using elasticsearch 0.90.1. We had some problems with the network, when the network is down (the convergence time is around 40s). Recovering the elastic after this event took a long time to be available. We have 16 data nodes and 16 shards with 2 replica and the settings: discovery.zen.minimum_master_nodes: 4 gateway.recover_after_nodes: 4 gateway.expected_nodes: 6 Any idea about how to minimize the recovery time? Is it a good idea to update the version? What will happen if we increase discovery.zen.fd.ping_interval and ping_timeout settings? Assuming the network is completely off, is there any way to wait for at least those 40 seconds to start marking servers down? Does the ping_timeout ignore the failure to connect to a node? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7997de22-b421-4f7c-ae7f-e001dbfef618%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/7997de22-b421-4f7c-ae7f-e001dbfef618%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQACJs%3Dzuu4gjYTvcKpF%2BSgv_1UkYZw%3DC0G8FkY1meAg-Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
multiple indices per document
I want to use ES to index logs coming from different processes. Assume I have 2 sources: ProcessA and ProcessB Logs from the processes are formatted in json. Example log: {level:DEBUG,logger:REPOSITORY,timestamp:1405982400689,attrs:{profile:ManagementServerA,organization:FOOBAR},thread:main,message:Repository.store() : Stored successfully in /central/zone/cef9cccab964} How can I get ES to update multiple indexes when it sees a new document ? In this case I want indices on the profile and organization values. Do I have to 1. Create indexes using the ES REST api before ES sees any logs. 2. Supply an _index field to each json document 3. Have multiple values in the _index field to indicate what indexes must be updated ? i.e should I have: *_index: {ManagementServerA , FOOBAR}* Please let me know if this is the correct way to do this. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c299f3e4-eebc-43a4-ab23-894605b2a752%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
explanation of each fields of _stats api data
Hi there, How can find documentations for explanation of json data returned by _stats for index. I mean meaning of each fields not just high level description. Specifically, For the following data primaries: {docs: {count: 1789457,deleted: 0},store: { size_in_bytes: 2085533463,throttle_time_in_millis: 582538},indexing: { index_total: 297925,index_time_in_millis: 113345,index_current: 0, delete_total: 0,delete_time_in_millis: 0,delete_current: 0}, I can't find out meaning of index_total field. and index_current field. for this index the document count is 1789457 which is about 3 times of the index_total value. Thanks, Jack -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2f5ef5fa-fe47-4907-a7d0-b3f7212e2fbe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Handling node failure in ES cluster
Max and min memory should be the same, mlockall is probably not working due to these being different as it can't lock a sliding window. Try setting that and see if it helps. Also you didn't mention your java version and release, which would be helpful. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 22 July 2014 02:38, kmoore.cce kmoore@gmail.com wrote: I have had some issues recently as I've expanded my ES cluster, where a single node failure causes basically all other index/search operations to timeout and fail. I am currently running elasticsearch v1.2.1 and primarily interface with the indices using the elasticsearch python module. My cluster is 20 nodes, each an m1.large ec2 instance. I currently have ~18 indices each with 5 shards and 3 replicas. The average size of each index is ~20GB and ~10 million documents (low is ~100K documents (300mb), high ~40 million (35gb)). I run each node with ES_MAX_SIZE=4g and ES_MIN_SIZE=512m. There are no other services running on the elasticsearch nodes, except ssh. I use zen unicast discovery with a set list of nodes. I have tried to enable 'bootstrap.mlockall', but the ulimit settings do not seem to be working and I keep getting 'Unable to lock JVM memory (ENOMEM)' errors when starting a node (note: I didn't see this log message when running 0.90.7). I have a fairly constant series of new or updated documents (I don't actually update, but rather reindex when a new document with the same id is found) that are being ingested all the time, and a number of users who are querying the data on a regular basis - most queries are set queries through the python API. The issue I have now is that while data is being ingested/indexed, I will hit Java heap out of memory errors. I think this is related to garbage collection as that seems to be the last activity in the logs nearly everytime this occurs. I have tried adjusting the heap max to 6g, and that seems to help but I am not sure it solves the issue. In conjunction with that, when the out of memory error occurs it seems to cause the other nodes to stop working effectively, timeout errors in both indexing and searching. My question is: what is the best way to support a node failing for this reason? I would obviously like to solve the underlying problem as well, but I would also like to be able to support a node crashing for some reason (whether it be because of me or because ec2 took it away). Shouldn't the failover in replicas support the missing node? I understand the cluster state would be yellow at this time, but I should be able to index and search data on the remaining nodes, correct? Are there configuration changes I can make to better support the cluster and identify or solve the underyling issue? Any help is appreciated. I understand I have a lot to learn about Elasticsearch, but I am hoping I can add some stability/resiliency to my cluster. Thanks in advance, -Kevin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/74ac48ec-0c05-4683-9c78-66d8c97687fa%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/74ac48ec-0c05-4683-9c78-66d8c97687fa%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624a1bjybx6b-B-7h%2BkVVy-JPEEvs0_9JaY-wbcLS5hPFhw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How to remove a cluster setting?
I made the following setting to my Elasticsearch cluster in order to decommission some old nodes in the cluster. After removed these old nodes, now I need to re-enable the cluster to allocate shards on those '10.0.6.*' nodes. Does anyone know how to remove this setting? PUT /_cluster/settings { transient: { cluster.routing.allocation.exclude._ip: 10.0.6.* } } Thanks in advance for any help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53df40a8-a248-4373-b789-e0490e3dab8a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Sorting a nested array
Hi Everyone, I am looking for a little guidance on how to setup my indexes to support sorting a nested array. To give a simple example, we have an index of products in our index, below is a dummy example document: /products/product/partnumber { name:Toaster, desc:For making toast, price:29.99, reviews:[ { rating : 5, comment : works great! }, { rating : 1, comment : Ugh!! }, { rating : 3, comment : Meh } ] } I would like to be able to query for product documents, but be able to sort the reviews array based on the rating. A customer could ask to see the first 10 best reviews, or the worst 5 reviews for example. Is this possible? Is this possible if the reviews are in a separate index and I execute a join query? Any information or guidance would be extremely helpful! Thanks! Darin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/500b16ff-b00f-44a9-a238-9c7a09432bfa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Sorting a nested array
Hello Darin , This is the wrong approach to do this. You can only sort per document and not per nested document and recieve output as documents and not as nested documents. So i would rather go for parent child approach. That being said , what you asked is actually possible with little tweaking on client side. - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_using_function_score So a function score query can has a score_mode of min or max. This means that if we give the field reward as the score value and score_mode , it will take the highest value per doc as the score. With this , you can get the top 10 documents bearing top 10 rating. Once you receive this you need to take out all ratings and do a sort again and then find the top 10 ratings. Possibility is that a single document might contain top 2 good reviews. With this i believe you might need to turn include_in_parent on. Thanks Vineeth On Tue, Jul 22, 2014 at 8:15 AM, Darin Amos darinamos.it@gmail.com wrote: Hi Everyone, I am looking for a little guidance on how to setup my indexes to support sorting a nested array. To give a simple example, we have an index of products in our index, below is a dummy example document: /products/product/partnumber { name:Toaster, desc:For making toast, price:29.99, reviews:[ { rating : 5, comment : works great! }, { rating : 1, comment : Ugh!! }, { rating : 3, comment : Meh } ] } I would like to be able to query for product documents, but be able to sort the reviews array based on the rating. A customer could ask to see the first 10 best reviews, or the worst 5 reviews for example. Is this possible? Is this possible if the reviews are in a separate index and I execute a join query? Any information or guidance would be extremely helpful! Thanks! Darin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/500b16ff-b00f-44a9-a238-9c7a09432bfa%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/500b16ff-b00f-44a9-a238-9c7a09432bfa%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DxMYAQfaZAzx7JyuHvAuBb4%3DZ9XBPyMkp_1G9uWEOdvA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Common website analytics aggregation formulas
Unique visitors http://www.elasticsearch.org/blog/count-elasticsearch/ On Saturday, May 3, 2014 5:17:13 AM UTC+5:30, Demetrius Nunes wrote: Hi guys, At my company we're building a platform product which has a big analytics component to it. I am intending to use elasticsearch to power that part of the platform. Most of the analytics examples that I see using elasticsearch aggregations are around systems logs monitoring. There quite a few metrics that I have to provide reporting that are very typical of website analytics, such as time spent on site, bounce rate, active users, etc. I've already implemented all the tracking code within the system and I have indexes with timestamps, user-generated events such as page hits, clicks, and so on. So, are there any good references, best practices, plugins or even formulas on how to implement these kinds of website analytics metrics using elasticsearch? Thanks a lot, Demetrius -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8e22b230-9929-43fe-842f-10a548cd25ee%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Handling node failure in ES cluster
Lots of things could be the source of problems here. Maybe you can tune the JVM params. We don't know what you are using or what your GC activity looks like. Can you share GC metrics graphs? If you don't have any GC monitoring, you can use SPM http://sematext.com/spm/. Why do you have 5 shards for all indices? Some seem small and shouldn't need to be sharded so much. Why do you have 3 replicas and not, say, just 2? (we don't know your query rates). Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Monday, July 21, 2014 12:38:49 PM UTC-4, kmoore.cce wrote: I have had some issues recently as I've expanded my ES cluster, where a single node failure causes basically all other index/search operations to timeout and fail. I am currently running elasticsearch v1.2.1 and primarily interface with the indices using the elasticsearch python module. My cluster is 20 nodes, each an m1.large ec2 instance. I currently have ~18 indices each with 5 shards and 3 replicas. The average size of each index is ~20GB and ~10 million documents (low is ~100K documents (300mb), high ~40 million (35gb)). I run each node with ES_MAX_SIZE=4g and ES_MIN_SIZE=512m. There are no other services running on the elasticsearch nodes, except ssh. I use zen unicast discovery with a set list of nodes. I have tried to enable 'bootstrap.mlockall', but the ulimit settings do not seem to be working and I keep getting 'Unable to lock JVM memory (ENOMEM)' errors when starting a node (note: I didn't see this log message when running 0.90.7). I have a fairly constant series of new or updated documents (I don't actually update, but rather reindex when a new document with the same id is found) that are being ingested all the time, and a number of users who are querying the data on a regular basis - most queries are set queries through the python API. The issue I have now is that while data is being ingested/indexed, I will hit Java heap out of memory errors. I think this is related to garbage collection as that seems to be the last activity in the logs nearly everytime this occurs. I have tried adjusting the heap max to 6g, and that seems to help but I am not sure it solves the issue. In conjunction with that, when the out of memory error occurs it seems to cause the other nodes to stop working effectively, timeout errors in both indexing and searching. My question is: what is the best way to support a node failing for this reason? I would obviously like to solve the underlying problem as well, but I would also like to be able to support a node crashing for some reason (whether it be because of me or because ec2 took it away). Shouldn't the failover in replicas support the missing node? I understand the cluster state would be yellow at this time, but I should be able to index and search data on the remaining nodes, correct? Are there configuration changes I can make to better support the cluster and identify or solve the underyling issue? Any help is appreciated. I understand I have a lot to learn about Elasticsearch, but I am hoping I can add some stability/resiliency to my cluster. Thanks in advance, -Kevin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9eb495e8-9ac6-4ef0-95ae-a6cc4516c67a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
curl indices.memory.index_buffer_size ??
Hello, I am trying to set indices.memory.index_buffer_size to 30% using curl to my running cluster And I am not able to make it stick I am doing this: $ curl -XPUT http://foo:9200/_cluster/settings -d '{ persistent : { indices.memory.index_buffer_size : 30% }}' {acknowledged:true,persistent:{},transient:{} But when I check settings it is not there? Any idea what I am doing wrong?? It is probably something obvious. But I don't see it... Thanks, -- Chris $ curl -XGET http://foo:9200/_cluster/settings?pretty=true { persistent : { threadpool : { index : { type : cached } } }, transient : { cluster : { routing : { allocation : { enable : all } } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0d6ca29a-3ecf-4050-85c8-f7672c4a964d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How to remove a cluster setting?
Try: PUT /_cluster/settings { transient: { cluster.routing.allocation.exclude._ip: } } -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 22 juil. 2014 à 02:50, Jeffrey Zhou jeffreyzhou2...@gmail.com a écrit : I made the following setting to my Elasticsearch cluster in order to decommission some old nodes in the cluster. After removed these old nodes, now I need to re-enable the cluster to allocate shards on those '10.0.6.*' nodes. Does anyone know how to remove this setting? PUT /_cluster/settings { transient: { cluster.routing.allocation.exclude._ip: 10.0.6.* } } Thanks in advance for any help! -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53df40a8-a248-4373-b789-e0490e3dab8a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/054923B7-1941-4FA0-B4B7-51A99A85F0B3%40pilato.fr. For more options, visit https://groups.google.com/d/optout.