Re: ingest performance degrades sharply along with the documents having more fileds
It's not surprising that the time increases when you have an order of magnitude more fields. Are you using the bulk API? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 13 June 2014 15:57, Maco Ma mayaohu...@gmail.com wrote: I try to measure the performance of ingesting the documents having lots of fields. The latest elasticsearch 1.2.1: Total docs count: 10k (a small set definitely) ES_HEAP_SIZE: 48G settings: {doc:{settings:{index:{uuid:LiWHzE5uQrinYW1wW4E3nA,number_of_replicas:0,translog:{disable_flush:true},number_of_shards:5,refresh_interval:-1,version:{created:1020199} mappings: {doc:{mappings:{type:{dynamic_templates:[{t1:{mapping:{store:false,norms:{enabled:false},type:string},match:*_ss}},{t2:{mapping:{store:false,type:date},match:*_dt}},{t3:{mapping:{store:false,type:integer},match:*_i}}],_source:{enabled:false},properties:{} All fields in the documents mach the templates in the mappings. Since I disabled the flush refresh, I submitted the flush command (along with optimize command after it) in the client program every 10 seconds. (I tried the another interval 10mins and got the similar results) Scenario 0 - 10k docs have 1000 different fields: Ingestion took 12 secs. Only 1.08G heap mem is used(only states the used heap memory). Scenario 1 - 10k docs have 10k different fields(10 times fields compared with scenario0): This time ingestion took 29 secs. Only 5.74G heap mem is used. Not sure why the performance degrades sharply. If I try to ingest the docs having 100k different fields, it will take 17 mins 44 secs. We only have 10k docs totally and not sure why ES perform so badly. Anyone can give suggestion to improve the performance? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bVPUUUAWJAaeLKwTrzSjprtdbFpp_SkBPHRkLxOdUaHg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: index template updating problem
The index template will only be applied when a new index is created. -- Ivan On Thu, Jun 12, 2014 at 5:54 AM, sri 1.fr@gmail.com wrote: Hello all, If i update the mapping in an existing index template the change is not reflected automatically, i have to manually delete the old mapping and then apply the template again. So my question is, is ES configured to run this way or should the mapping changes be updated automatically. Thanks and Regards Sri -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aac8049b-91ff-4fdb-a12a-5bc54a5afde3%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/aac8049b-91ff-4fdb-a12a-5bc54a5afde3%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDUSaOKfXD040v-427B6bjeoK3gh16m8iw%3DEx8MdWTysA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: ingest performance degrades sharply along with the documents having more fileds
I used the curl command to do the ingestion(one command, one doc) and flush. I also tried the Solr(disabled the soft/hard commit do the commit with client program) with the same data commands and its performance did not degrade. Lucene are used for both of them and not sure why there is a big difference with the performances. On Friday, June 13, 2014 2:02:58 PM UTC+8, Mark Walkom wrote: It's not surprising that the time increases when you have an order of magnitude more fields. Are you using the bulk API? Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com javascript: web: www.campaignmonitor.com On 13 June 2014 15:57, Maco Ma mayao...@gmail.com javascript: wrote: I try to measure the performance of ingesting the documents having lots of fields. The latest elasticsearch 1.2.1: Total docs count: 10k (a small set definitely) ES_HEAP_SIZE: 48G settings: {doc:{settings:{index:{uuid:LiWHzE5uQrinYW1wW4E3nA,number_of_replicas:0,translog:{disable_flush:true},number_of_shards:5,refresh_interval:-1,version:{created:1020199} mappings: {doc:{mappings:{type:{dynamic_templates:[{t1:{mapping:{store:false,norms:{enabled:false},type:string},match:*_ss}},{t2:{mapping:{store:false,type:date},match:*_dt}},{t3:{mapping:{store:false,type:integer},match:*_i}}],_source:{enabled:false},properties:{} All fields in the documents mach the templates in the mappings. Since I disabled the flush refresh, I submitted the flush command (along with optimize command after it) in the client program every 10 seconds. (I tried the another interval 10mins and got the similar results) Scenario 0 - 10k docs have 1000 different fields: Ingestion took 12 secs. Only 1.08G heap mem is used(only states the used heap memory). Scenario 1 - 10k docs have 10k different fields(10 times fields compared with scenario0): This time ingestion took 29 secs. Only 5.74G heap mem is used. Not sure why the performance degrades sharply. If I try to ingest the docs having 100k different fields, it will take 17 mins 44 secs. We only have 10k docs totally and not sure why ES perform so badly. Anyone can give suggestion to improve the performance? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8694a4da-68f6-40b3-9d40-fbbc63041cad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Elastic Search and consistency
On Thu, Jun 12, 2014 at 8:52 PM, shikhar shik...@schmizz.net wrote: ES currently does not seem to provide any guarantee that an acknowledged write (from the caller's perspective) succeeded on a quorum of replicas. I take this back, I understand the ES model better now. So although the write-consistency-level check is only applied before the write is about to be issued, with sync replication the client can only get an ack if it succeded on the primary shard as well as all replicas (as per the same cluster state as the check is performed on). In case it fails on some replica(s), the operation would be retried (together with the write-consistency-level check using a possibly-updated cluster state). This makes it unsuitable for a primary data store, given you can see data loss despite having replicas! If using ES as a primary store, you should really be running it with * index.gateway.local.sync: 0* to make sure the translog fsync's on every write operation A follow-up question: what if there is a failure on one of the replicas that prevents writes (e.g. disk full) but this is not preventing the node from dropping out of discovery state due to being healthy otherwise? Does it not make that node a SPOF? This is something we have run into with SolrCloud https://issues.apache.org/jira/browse/SOLR-5805. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOi0sTu5_mwv%3DNJL5SH7%3D1Z5CG2iULSbF0_P7ZDULY-qw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type *click for country = US and code=12. What I was initially doing was to generate a scriptFilter for the request document (in Groovy) and I was adding multiple aggregations in one search request. This ended up being very slow so I removed the scripting logic and I supported my logic with java code.* What seems to be initially solved in my local machine, when I got back to the cluster, nothing has changed. Again my app performs really really poor. I get more than 10 seconds to perform a search with ~10 sub-aggregations. What seems strange is that I notice that the cluster is pretty ok with regards load average, CPU etc. Any hints on where to look for solving this out? to be able to identify the bottleneck *Ask for any additional information to provide*, I didn't want to make this post too long to read Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
Can you show us what your request looks like? (including query and aggs) On Fri, Jun 13, 2014 at 9:09 AM, Thomas thomas.bo...@gmail.com wrote: Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type *click for country = US and code=12. What I was initially doing was to generate a scriptFilter for the request document (in Groovy) and I was adding multiple aggregations in one search request. This ended up being very slow so I removed the scripting logic and I supported my logic with java code.* What seems to be initially solved in my local machine, when I got back to the cluster, nothing has changed. Again my app performs really really poor. I get more than 10 seconds to perform a search with ~10 sub-aggregations. What seems strange is that I notice that the cluster is pretty ok with regards load average, CPU etc. Any hints on where to look for solving this out? to be able to identify the bottleneck *Ask for any additional information to provide*, I didn't want to make this post too long to read Thank you -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Adrien Grand -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5mt_vb_9kSNGTnkYUZruN_wiuT5K5OpOxJhtq1x%3DEFmQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How To Disable Recovery Process / Delete Old Shards
During a botched upgrade process my data was deleted. As it was a test server it didn't matter. However, upon reinstall it just constantly tries to recover old shards, even after deleting every know file on the server that contains elasticsearch data. Can someone let me know of how to disable the recovery process and where elasticsearch hides the file it reads to see what files to recover? Below is an example from the log file (repeated constantly): ] [Quicksand] [blurays][1] recovery from [[Stilt-Man][mxmoAlTaTkClmfpImcpb1A][254020-ipaddress]] failed org.elasticsearch.transport.RemoteTransportException: [Stilt-Man][inet[/ipaddress]][index/shard/recovery/startRecovery] Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [blurays][1] Phase[1] Execution failed at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:996) at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:631) at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:122) at org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:62) at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:351) at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: [blurays][1] Failed to transfer [1] files with total size of [71b] at org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:243) at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:993) ... 9 more Caused by: java.nio.file.NoSuchFileException: /home/programs/elasticsearch-1.2.1/data/elasticsearch/nodes/1/indices/blurays/1/index/segments_1 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:176) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:334) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193) at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80) at org.elasticsearch.index.store.Store.openInputRaw(Store.java:319) at org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:189) ... 3 more -- View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-To-Disable-Recovery-Process-Delete-Old-Shards-tp4057556.html Sent from the ElasticSearch Users mailing list archive at Nabble.com. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1402563728183-4057556.post%40n3.nabble.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
Below is an example aggregation i perform, is there any optimizations I can perform? Maybe disabling some features i do not need etc. curl -XPOST http://localhost:9200/logs-idx.20140613/event/_search?search_type=count; -d ' { aggs: { f1: { filter: { or: [ { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } } ] } } } }, { range: { event_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] }, { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } }, { range: { request_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] } } } }, { range: { event_time: { lt: 2014-06-13T10:00:00 } } } ] } ] }, aggs: { per_interval: { date_histogram: { field: event_time, interval: minute }, aggs: { metrics: { terms: { field: event, size: 10 } } } } } } } }' On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote: Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type *click for country = US and code=12. What I was initially doing was to generate a scriptFilter for the request document (in Groovy) and I was adding multiple aggregations in one search request. This ended up being very slow so I removed the scripting logic and I supported my logic with java code.* What seems to be initially solved in my local machine, when I got back to the cluster, nothing has changed. Again my app performs really really poor. I get more than 10 seconds to perform a search with ~10 sub-aggregations
does document database means denormalize
What I am asking is Do different design decisions apply in elasticsearch compared to relational Is denormalized better for elasticsearch -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f87a55a-c9e8-4198-a5ce-72054ce52958%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Accessing Search Templates via Rest
so i guess its not possible? Am Dienstag, 10. Juni 2014 16:58:31 UTC+2 schrieb Sebastian Gräser: Hello, maybe someone can help me. Is there a way to get the available search templates via rest api? havent found a way yet, hope you can help me. Best regards Sebastian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae1fedb0-4c74-4407-9532-fe7ad705ceb0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
Is this request only about getting aggregations? If so you would probably get better response times by putting the filter in the query part (under a filtered query) and only having the date histogram in the aggregation. The reason is that aggregations are computed on matches, and in case the query is not specified, that means all documents of your index. On Fri, Jun 13, 2014 at 9:41 AM, Thomas thomas.bo...@gmail.com wrote: Below is an example aggregation i perform, is there any optimizations I can perform? Maybe disabling some features i do not need etc. curl -XPOST http://localhost:9200/logs-idx.20140613/event/_search?search_type=count; - d' { aggs: { f1: { filter: { or: [ { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } } ] } } } }, { range: { event_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] }, { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } }, { range: { request_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] } } } }, { range: { event_time: { lt: 2014-06-13T10:00:00 } } } ] } ] }, aggs: { per_interval: { date_histogram: { field: event_time, interval: minute }, aggs: { metrics: { terms: { field: event, size: 10 } } } } } } } }' On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote: Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type
Re: Java API ES 0.90.9 Array (2 elements) in search result gets only one value in SearchHitField.getValues()
I've tested it with ES 1.1 and the described behaviour is gone. So the Java API does a correct interpretation of the JSON search result. On Thursday, January 30, 2014 11:23:04 PM UTC+1, Martin Pape wrote: Thanks for the information. I still have some months till production, so might workaround now and wait for ES 1.0. Anyone know then ES 1.0 is planned to be release? BR - Martin On Thursday, January 30, 2014 6:54:58 PM UTC+1, Binh Ly wrote: Martin, I have verified this behavior and it still pesists in 0.90.10. I checked the latest ES master build at it indeed returns 2 values in the List as expected so I am expecting it to behave as you expect in ES 1.0. For now, what it does is it returns a single item inside the List, but that item is in turn an ArrayList of 2 String values. If you have only 1 value, it returns a single item in the List, and that item is a String. So you can test accordingly in code to check if the value is a String or an ArrayList and then adjust accordingly. Should be rectified in 1.0 I hope. :) -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5bc24d1a-d696-4bc5-97bd-86f6780cd4f4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Cassandra with JDBC river plugin
Hi Everyone, I am trying to move data from Cassandra to Elasticsearch. Initially I tried the cassandra-river at https://github.com/eBay/cassandra-river. However I got timed out error which I suspect was originating from the Hector API. I posted a question on ths thread https://groups.google.com/forum/#!searchin/elasticsearch/cassandra/elasticsearch/4oDbkqK3GVA/W9WLK4SS2MEJ. Moving on I thought of using the JDBC-river at https://github.com/jprante/elasticsearch-river-jdbc with a java driver for cassandra. I followed the mysql example and modified it for cassandra. I created the river using as follows: curl -XPUT '192.168.1.103:9200/_river/my_jdbc_river/_meta' -d '{ type : jdbc, jdbc : { url : jdbc:cassandra://192.168.1.105:9160/transactionlogdb, cql : select * from logs } }' {_index:_river,_type:my_jdbc_river,_id:_meta,_version:1,created:true} However I don't find any documents being created on the jdbc index. Am I missing something? Any help or tips is very much appreciated. Thanks is advance. Kind Regards, Abhishek Mukherjee -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d4228b3-0568-4a51-b42b-8bc4ca6b7e79%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES 1.2.1 sort by _timestamp
On Thursday, June 12, 2014 6:52:16 PM UTC+2, Itamar Syn-Hershko wrote: This is weird. Are you sure what you are seeing is not overridden documents (can happen if you specify the ID yourself)? Can you add the _timestamp field to the results and verify the documents are indeed not sorted by _timestamp? The id is also automatically generated by ES. Do i need to store the _timestamp field to be able to retrieve it using fields : [_timestamp] in my query? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1190f3b-2707-4b1e-9151-5967a7a54733%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES 1.2.1 sort by _timestamp
Possibly, because it's not provided in the _source, or just use this: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html#_path_2 -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Fri, Jun 13, 2014 at 11:26 AM, Stefan Eberl cpppw...@gmail.com wrote: On Thursday, June 12, 2014 6:52:16 PM UTC+2, Itamar Syn-Hershko wrote: This is weird. Are you sure what you are seeing is not overridden documents (can happen if you specify the ID yourself)? Can you add the _timestamp field to the results and verify the documents are indeed not sorted by _timestamp? The id is also automatically generated by ES. Do i need to store the _timestamp field to be able to retrieve it using fields : [_timestamp] in my query? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1190f3b-2707-4b1e-9151-5967a7a54733%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c1190f3b-2707-4b1e-9151-5967a7a54733%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvRD%3DbHqX-snqSLV9p%3D%3DA3KEwhP%3DNHAiH6m%2BMz2AhRvAQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
So I restructured my curl as follows, is this what you mean?, by doing some first hits i do get some slight improvement, but need to check into production data: Thank you will try it and come back with results curl -XPOST http://10.129.2.42:9200/logs-idx.20140613/event/_search?search_type=count; -d' { query: { filtered: { filter: { or: [ { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } } ] } } } }, { range: { event_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] }, { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } }, { range: { request_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] } } } }, { range: { event_time: { lt: 2014-06-13T10:00:00 } } } ] } ] } } }, aggs: { per_interval: { date_histogram: { field: event_time, interval: minute }, aggs: { metrics: { terms: { field: event, size: 12 } } } } } }' On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote: Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type *click for country = US and code=12. What I was initially doing was to generate a scriptFilter for the request document (in Groovy) and I was adding multiple aggregations in one search request. This ended up being very slow so I removed the scripting logic and I supported my logic with java code.* What seems to be initially solved in my local machine, when I got back to the cluster, nothing has changed. Again my app performs really really poor. I get more than 10 seconds to perform a search with ~10
Mapping for a hash map
Hi there, I'd like to define a mapping for a hash map but I do not manage to get it right. Here is the kind of documents I'd like to index: { message : Elasticsearch test 1, dates: { create: 2014-01-11, update: 2014-06-12 } } { message : Elasticsearch test 2, dates: { date_1: 2014-01-11, } } Note: date_1 is on purpose, I cannot know at mapping definition how many dates I will have to deal with. As is, without mapping it works automagically (probably thanks to type autodetection) but is there a mean to get it done without ? My problem is that I might have stuff like that too: { message : Elasticsearch test 3, strings: { string_1: some text, string_2: 2014-01-11 } } { message : Elasticsearch test 4, strings: { string_2: some other text } } In this case I need to be able to enforce that string_2 is not a date What is the right way to do it ? Manuel -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f45de373-4b57-4550-b9ae-c68d71dcf459%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cassandra with JDBC river plugin
Checking the Elasticsearch log files I found this. No suitable driver found for jdbc:cassandra://192.168.1.103:9160/transactionlogdb at java.sql.DriverManager.getConnection(DriverManager.java:689) at java.sql.DriverManager.getConnection(DriverManager.java:247) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:133) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:271) However I have placed all the necessary jar files for the driver in $ES_HOME/plugins/jdbc. Please advice. Kind Regards Abhishek On Friday, June 13, 2014 1:43:45 PM UTC+5:30, Abhishek Mukherjee wrote: Hi Everyone, I am trying to move data from Cassandra to Elasticsearch. Initially I tried the cassandra-river at https://github.com/eBay/cassandra-river. However I got timed out error which I suspect was originating from the Hector API. I posted a question on ths thread https://groups.google.com/forum/#!searchin/elasticsearch/cassandra/elasticsearch/4oDbkqK3GVA/W9WLK4SS2MEJ . Moving on I thought of using the JDBC-river at https://github.com/jprante/elasticsearch-river-jdbc with a java driver for cassandra. I followed the mysql example and modified it for cassandra. I created the river using as follows: curl -XPUT '192.168.1.103:9200/_river/my_jdbc_river/_meta' -d '{ type : jdbc, jdbc : { url : jdbc:cassandra://192.168.1.105:9160/transactionlogdb, cql : select * from logs } }' {_index:_river,_type:my_jdbc_river,_id:_meta,_version:1,created:true} However I don't find any documents being created on the jdbc index. Am I missing something? Any help or tips is very much appreciated. Thanks is advance. Kind Regards, Abhishek Mukherjee -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How ElasticSearch nodes syncrhonise in Cluster when nodes have different Index mappings
Hi , Kindly help me to understand the behaviour of ES nodes in Cluster when nodes have different Index mappings. I have 2 ES nodes both are currently having same Index versions. Now I want to upgrade both the nodes with the new index mapping. Scenario 1 : Without keeping the node down, start mapping changes on Node1. During the mapping changes, if any request comes suppose it is handled by Node1 only. Now, How when this Node#1 Node#2 will synchronise? Scenario 2 : Without keeping the node down, start mapping changes on Node1. During the mapping changes, if any request comes suppose it is handled by Node2. Now, How when this Node#1 Node#2 will synchronise? Want to know, whose data will be available after the complete mappings completion on both nodes. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/848a1c4e-a0a1-4a65-80d9-d184bda6b3ef%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?
Indeed that is what I meant. On Fri, Jun 13, 2014 at 10:33 AM, Thomas thomas.bo...@gmail.com wrote: So I restructured my curl as follows, is this what you mean?, by doing some first hits i do get some slight improvement, but need to check into production data: Thank you will try it and come back with results curl -XPOST http://10.129.2.42:9200/logs-idx.20140613/event/_search?search_type=count; -d' { query: { filtered: { filter: { or: [ { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } } ] } } } }, { range: { event_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] }, { and: [ { has_parent: { type: request, filter: { and: { filters: [ { term: { country: US } }, { term: { city: NY } }, { term: { code: 12 } }, { range: { request_time: { gte: 2014-06-13T10:00:00, lt: 2014-06-13T11:00:00 } } } ] } } } }, { range: { event_time: { lt: 2014-06-13T10:00:00 } } } ] } ] } } }, aggs: { per_interval: { date_histogram: { field: event_time, interval: minute }, aggs: { metrics: { terms: { field: event, size: 12 } } } } } }' On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote: Hi, I'm facing a performance issue with some aggregations I perform, and I need your help if possible: I have to documents, the *request* and the *event*. The request is the parent of the event. Below is a (sample) mapping event : { dynamic : strict, _parent : { type : request }, properties : { event_time : { format : dateOptionalTime, type : date }, count : { type : integer }, event : { index : not_analyzed, type : string } } } request : { dynamic : strict, _id : { path : uniqueId }, properties : { uniqueId : { index : not_analyzed, type : string }, user : { index : not_analyzed, type : string }, code : { type : integer }, country : { index : not_analyzed, type : string }, city : { index : not_analyzed, type : string } } } My cluster is becoming really big (almost 2 TB of data with billions of documents) and i maintain one index per day, whereas I occasionally delete old indices. My daily index is about 20GB big. The version of elasticsearch that I use is 1.1.1. My problems start when I want to get some aggregations of events with some criteria which is applied in the parent request document. For example count be the events of type *click for country = US and code=12. What I was initially doing was to generate a scriptFilter for the request document (in Groovy) and I was adding multiple aggregations in one search request. This ended up being very slow so I removed the scripting logic and I supported my logic with java code
Re: does document database means denormalize
Yes, definitely think in terms of denormalizing. Joins are hard/expensive in elasticsearch so you need to avoid needing to joing by prejoining. But you have other options as well, see http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/ So, say you had a person table and a address table in a database, where you have a 1:1 relation, that's a no brainer: shove the address in the person index along with the rest of the person data. If you had another table called company with a 1:n relation to person, it gets more tricky. Now you have options. Option 1: put the company data in the person index. Sure you are copying data all over the place but storage is cheap and it is not like you are going to have a trillion companies or persons. Your main worry is not space but consistency. What happens if you need to change the company details? Option 2: put the person objects in an array in the company objects. Fine as long as you don't need to query for the persons separately. Option 3: store just the company id in the person index or the person id in the company index (array). Now you will end up in situations where you may need to join and you'll have to fire many queries and manipulate search results to do it, which is slow, tedious to program, and somewhat error prone. But for simple use cases you might get away with it. Option 4: use nested documents to put persons in companies. Now you can use nested queries and aggregations, which give you join like benefits. Don't use this for massive amounts of nested documents on a single parent. Option 5: use parent child documents to give persons a company parent. More flexibe than nested and gives you some performance benefits since parent and child reside on the same shard. So same as option 3 but faster. Option 6: compromise: denormalize some but not all of the fields and keep things in a separate index as well. With n:m style relations it gets a bit harder. Probably you don't want to index the cartesian product, so you'll need to compromise. Any of the options above could work. All depends on how many relations you are really managing. We've actually gotten rid of our database entirely. Once you get used to it, thinking in terms of documents is much more natural than thinking in terms of rows, tables, and relations. You have much less of an impedance mismatch that you need to pretend does not exist with some object relational library. It's more like here's an object, serialize it, store it, query for it. Jilles On Friday, June 13, 2014 9:48:37 AM UTC+2, eune...@gmail.com wrote: What I am asking is Do different design decisions apply in elasticsearch compared to relational Is denormalized better for elasticsearch -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/69337cde-4962-4c9f-a59a-3c01d26440a6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
How ElasticSearch nodes syncrhonise in Cluster when nodes have different Index mappings
Mapping is applied at cluster level, and existing index wont get the new mapping. You will need to reindex your data, aka create a new index after you apply the new mapping -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4a02b6e5-71e9-47bf-a334-379a666ed2bd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How ElasticSearch nodes syncrhonise in Cluster when nodes have different Index mappings
Yes, I am creating new index and then migrating the data from older index to new index. So, when this migration is going on, if any request comes, then what would be the behaviour? On Friday, June 13, 2014 3:11:52 PM UTC+5:30, Luis García Acosta wrote: Mapping is applied at cluster level, and existing index wont get the new mapping. You will need to reindex your data, aka create a new index after you apply the new mapping -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c903f779-1b3d-42fd-8d49-4157465c9eac%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES 1.2.1 sort by _timestamp
On Friday, June 13, 2014 10:31:53 AM UTC+2, Itamar Syn-Hershko wrote: Possibly, because it's not provided in the _source, or just use this: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html#_path_2 So your suggestion is to have my app fill an additional field, which then gets's mapped to _timestamp, correct? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d68a20f4-8234-4bfc-b4ee-d135f948dda5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: ES 1.2.1 sort by _timestamp
This is just to debug this, to make sure results are indeed not sorted by _timestamp, as you claim. Probably easier to just set _timestamp to stored. -- Itamar Syn-Hershko http://code972.com | @synhershko https://twitter.com/synhershko Freelance Developer Consultant Author of RavenDB in Action http://manning.com/synhershko/ On Fri, Jun 13, 2014 at 12:49 PM, Stefan Eberl cpppw...@gmail.com wrote: On Friday, June 13, 2014 10:31:53 AM UTC+2, Itamar Syn-Hershko wrote: Possibly, because it's not provided in the _source, or just use this: http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/mapping-timestamp-field.html#_path_2 So your suggestion is to have my app fill an additional field, which then gets's mapped to _timestamp, correct? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d68a20f4-8234-4bfc-b4ee-d135f948dda5%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d68a20f4-8234-4bfc-b4ee-d135f948dda5%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuuCpjAnQhGM3WnKuPd8VcgD3X_HqDcf7MA%2B%3D7dYRcjxg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
How we can use different analyzers for one field.(At a time only one analyzer we will use in search but, search requirement is differ)
Hi, I have below Requirement. Please help me. I am using *elasticsearch-1.1.0* - In Index, I have n no.of Fields. and m no.of Types - Eg: Types: Person,Book - Eg: Fields: - Person: Name,age,Email,Phone - Book: Name,author,price - How to set the analyzers to all Fields and all types. *My Search Requirement:* Input: Person: Name: john smith age:30 Email: j...@gmail.com Phone: (987) 123-4567 -- Name: John Smith age:30 Email: j...@gmail.com Phone: (879) 123-4567 Name: django$haystack age:30 Email: j...@gmail.com Phone: (987) 123-4567 --- Name: django#haystack age:30 Email: j...@gmail.com Phone: (987) 123-4567 *Scenario1:* *Search String* : django#haystack *Results*: django$haystack,django#haystack But, We are expected Result is *django#haystack* *Scenario2:* *Search String: *John Smith *Results*: John Smith,john smith Expected Results are John Smith *Scenario3:* *Search String: *John Smith *Results*: John Smith,john smith *It is fine. But, we need to support Scenario2 also. How we can support Scenario2 and Scenario3 using analyzers.* *Please Help me in this.* -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ef287049-2c3a-4862-b077-82bf3cb63a9e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How ElasticSearch nodes syncrhonise in Cluster when nodes have different Index mappings
That depends on how you do the migration, it's not something ES handles automatically, you need to do it yourself. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 13 June 2014 19:47, Bhupali Kalmegh bhupali...@gmail.com wrote: Yes, I am creating new index and then migrating the data from older index to new index. So, when this migration is going on, if any request comes, then what would be the behaviour? On Friday, June 13, 2014 3:11:52 PM UTC+5:30, Luis García Acosta wrote: Mapping is applied at cluster level, and existing index wont get the new mapping. You will need to reindex your data, aka create a new index after you apply the new mapping -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c903f779-1b3d-42fd-8d49-4157465c9eac%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/c903f779-1b3d-42fd-8d49-4157465c9eac%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624btWzhjYs-RsFSU%3DTN1JUmEDdDfabpNU-0Yd2%2BSP3J%2BiQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Cassandra with JDBC river plugin
The Cassandra Java Driver is not a JDBC driver. Jörg On Fri, Jun 13, 2014 at 11:11 AM, Abhishek Mukherjee 4271...@gmail.com wrote: Checking the Elasticsearch log files I found this. No suitable driver found for jdbc:cassandra:// 192.168.1.103:9160/transactionlogdb at java.sql.DriverManager.getConnection(DriverManager.java:689) at java.sql.DriverManager.getConnection(DriverManager.java:247) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:133) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:271) However I have placed all the necessary jar files for the driver in $ES_HOME/plugins/jdbc. Please advice. Kind Regards Abhishek On Friday, June 13, 2014 1:43:45 PM UTC+5:30, Abhishek Mukherjee wrote: Hi Everyone, I am trying to move data from Cassandra to Elasticsearch. Initially I tried the cassandra-river at https://github.com/eBay/cassandra-river. However I got timed out error which I suspect was originating from the Hector API. I posted a question on ths thread https://groups.google.com/ forum/#!searchin/elasticsearch/cassandra/elasticsearch/4oDbkqK3GVA/ W9WLK4SS2MEJ. Moving on I thought of using the JDBC-river at https://github.com/jprante/elasticsearch-river-jdbc with a java driver for cassandra. I followed the mysql example and modified it for cassandra. I created the river using as follows: curl -XPUT '192.168.1.103:9200/_river/my_jdbc_river/_meta' -d '{ type : jdbc, jdbc : { url : jdbc:cassandra://192.168.1.105:9160/transactionlogdb, cql : select * from logs } }' {_index:_river,_type:my_jdbc_river,_id:_meta, _version:1,created:true} However I don't find any documents being created on the jdbc index. Am I missing something? Any help or tips is very much appreciated. Thanks is advance. Kind Regards, Abhishek Mukherjee -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGf1GjB8MSefx6ZC1OYD0b6Xf%3DKX%2BpDHcYe7cvZVmeyJg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Query multiple strings in a field in kibana3?
You can save dashboards with the query, if that is what you want. You will need to save one per query though. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 13 June 2014 18:15, Siddharth Trikha siddharthtrik...@gmail.com wrote: I am using Logstash 1.4.1, elasticsearch 1.1.1, kibana 3.1 for analyzing my logs. I get the parsed fields (from log) in Kibana 3. Now, I have often query on a particular field for many strings. Eg: auth_message is a field and I may have to query for like 20 different strings (all together or separately). If together: auth_message: login failed OR user XYZ OR authentication failure OR . If separate queries: auth_message: login failed auth_message: user XYZ auth_message: authentication failure So user cannot remember 20 strings for a field to be searched for. Is there a way to store or present it to user to select the strings he wants to search for. Can this be done using ELK ?? Please help -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/751ba805-557c-4531-9a4f-fe3d4d05a495%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/751ba805-557c-4531-9a4f-fe3d4d05a495%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624a56NNScBye20btBhLLxxCNMHT%2BHE6_Em48v_bag5G-sQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: RepositoryMissingException
good question. that is what is being returned when I make the call. but your question gave me an idea as to what the problem is. thanks. On Jun 12, 2014 11:32 PM, David Pilato da...@pilato.fr wrote: What is this -d in statlogs -d? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 13 juin 2014 à 03:58, Shawn Mullen shawnmull...@gmail.com a écrit : I have an ElasticSearch instance running on my local machine. I installed the S3 plugin so I can do backup and restore operation to/from S3. I tried to follow the documentation on how to set this up. I was able to register a snapshot repository and I have a bucket in S3 created just for backups. When I do a /_all I see the current repo settings. So, at this point all looks fine. However, when I try to create a snapshot it fails with RepositoryMissingException. This is what I get for a /_all: { statlogs -d: { type: s3, settings: { region: us-east, bucket: my-bucket-name, access_key: my-access-key, secret_key: my-secret-key } } } This is what I am sending when I try to do a snapshot: PUT /_snapshot/statlogs/snapshot_1 -d { indices: [statexceptionlog], ignore_unavailable: true, include_global_state: false } I am using Sense to send the commands. I'm assuming I am getting the error because of something wrong with my S3 settings but I don't know what it would be. I'm making this assumption because the /_all returns data (but I guess that could be wrong). Any ideas on what the issue might be? What exactly causes RepositoryMissingException? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac375bf8-6f78-41a2-9cb0-0409234a004f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ac375bf8-6f78-41a2-9cb0-0409234a004f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/-av1rGK1bQE/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/36BC1F5F-1D3A-4E56-A519-C1952C9C0D80%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/36BC1F5F-1D3A-4E56-A519-C1952C9C0D80%40pilato.fr?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFwLvuLJwOaqYicR%3DPRHzO9vWdnVbHb1WjcYHuRXSgwXRAUx%2BQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Email alert after threshold crossed logstash?
I am using logstash, elasticsearch and kibana to analyse my logs. I am alerting via email when a particular string comes into the log via email output in logstash: email { match = [ Session Detected, logline,*Session closed* ] ... } This works fine. Now, I want to alert on the count of a field (when a threshold is crossed): Eg If user is field, I want to alert when number of unique users go more than 5. Can this be done via email output in logstash?? Please help. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc8f4f96-6593-424d-9599-759092b5c409%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Query multiple strings in a field in kibana3?
So no way to store the query itself? I will have save the entire dashboard? On Fri, Jun 13, 2014 at 4:35 PM, Mark Walkom ma...@campaignmonitor.com wrote: You can save dashboards with the query, if that is what you want. You will need to save one per query though. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 13 June 2014 18:15, Siddharth Trikha siddharthtrik...@gmail.com wrote: I am using Logstash 1.4.1, elasticsearch 1.1.1, kibana 3.1 for analyzing my logs. I get the parsed fields (from log) in Kibana 3. Now, I have often query on a particular field for many strings. Eg: auth_message is a field and I may have to query for like 20 different strings (all together or separately). If together: auth_message: login failed OR user XYZ OR authentication failure OR . If separate queries: auth_message: login failed auth_message: user XYZ auth_message: authentication failure So user cannot remember 20 strings for a field to be searched for. Is there a way to store or present it to user to select the strings he wants to search for. Can this be done using ELK ?? Please help -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/751ba805-557c-4531-9a4f-fe3d4d05a495%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/751ba805-557c-4531-9a4f-fe3d4d05a495%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/oVamXmsrmVc/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624a56NNScBye20btBhLLxxCNMHT%2BHE6_Em48v_bag5G-sQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624a56NNScBye20btBhLLxxCNMHT%2BHE6_Em48v_bag5G-sQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- Regards Siddharth Trikha -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH%3D5yJz49dJp94ubCL-Ewa%2BK4fg%3D%3DWBJEvixWZKNaiTinkdyaA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Runtime JRE?
Yes, you can use Java Server JRE. It is a build without Java desktop graphics library (aka headless JVM). Jörg On Fri, Jun 13, 2014 at 1:53 PM, thatguy1...@gmail.com wrote: I know the guide says the following: While a JRE can be used for the Elasticsearch service, due to its use of a client VM (as oppose to a server JVM which offers better performance for long-running applications) its usage is discouraged and a warning will be issued. But I noticed something on oracles page called the Server JRE. Does anyone know if this is equivalent to the server JVM at runtime? Steve -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d8cc08b5-a536-4c4a-a639-969bae1ae34e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d8cc08b5-a536-4c4a-a639-969bae1ae34e%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHS%2Bqtn-cU1VcLK4CmKoOg_CMYtY2XWBbOxHWXiL0QDNg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Configuring YML files Location
Hi, I am trying to setup the configuration of ES (elasticsearch.yml and logging.yml) outside of the elasticsearch package. I have put the two files in a separate location and pointed the CONF_DIR to that location. I launch the ES server by specifying the cluster name and node name. The problem I am seeing is that this configuration is not getting picked-up. I verified this by checking the logs files. The log files get updated when I have the yml config files in the ES directory. But when I move them out, the logs don't get updated. Any pointers on how to get configure the yml files location outside of the ES package ? Thanks, Karthik Jayanthi -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7PaHz1PD26CMf1sN2AsudPoVDjNBTzbU7HwmtvJ39FuTWjdA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: compresstion in ES 1.2.1
Hello Jorg, I am sorry, there was some problem in the implementation at my end. Thanks a lot guys for the insight and help. Appreciate the quick responses. Thanks and Regards Sri On Sunday, June 8, 2014 5:04:24 PM UTC-4, sri wrote: Hello Jorg, Thanks a lot for the info., i tried applying the template provided by you but the size is not reducing.On the other hand, I was noticing decrease in size when i was disabling the fields via Mapping API. Thanks and Regards Sri On Sunday, June 8, 2014 4:37:58 PM UTC-4, Jörg Prante wrote: Try this index template for new index creations curl -XPUT 'localhost:9200/_template/template1' -d ' { template : *, mappings : { _default_ : { _source : { enabled : false }, _all : { enabled : false} } } } ' See also http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html You can not disable _all or _source in an existing index. Jörg On Sun, Jun 8, 2014 at 10:22 PM, sri 1.fr...@gmail.com wrote: Thanks a lot for the insight Patrick. I have a few more queries: - it is possible to disable the '_source' and '_all' fields by default for all the indices that would be created later (possibility define in the elasticsearch.yml file) - what happens if my index is created and then i disable '_source' and '_all' fields, would that effect the file size of the index, i.e., will the fields be removed/disabled for only the documents that will be added after the disabling the fields?? Thanks and Regards Sri On Sunday, June 8, 2014 2:48:16 PM UTC-4, Patrick Proniewski wrote: Hello, I don't know how it's compressed but it appears that data is compressed up to an amount of 4k. ie. it's useless to store data on a compressed (lz4) filesystem if fs block size is 4k: Filesystem SizeUsed Avail Capacity Mounted on zdata/ES-lz4 1.1T1.9G1.1T 0%/zdata/ES-lz4 zdata/ES 1.1T1.9G1.1T 0%/zdata/ES But if fs block size is greater (say 128k), filesystem compression is a huge win: Filesystem SizeUsed Avail Capacity Mounted on zdata/ES-lz4 1.1T1.1G1.1T 0% /zdata/ES-lz4- compressratio 1.73x zdata/ES-gzip 1.1T901M1.1T 0% /zdata/ES-gzip- compressratio 2.27x zdata/ES 1.1T1.9G1.1T 0%/zdata/ES Unfortunately, a filesystem block size greater than 4K is not optimal for IO (unless you have a big amount of physical memory you can dedicate to filesystem data cache, which would be redundant with ES cache). On 08 juin 2014, at 18:41, David Pilato wrote: It's compressed by default now. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 8 juin 2014 à 18:01, sri 1.fr...@gmail.com a écrit : Hello everyone, I have read posts and blogs on how elasticsearch compression can be enabled in the previous versions(0.17 - 0.19). I am currently using ES 1.2.1, i wasn't able to find out how to enable compression in this version or if at all there is any such option for it. I know that i can reduce the storage amount by disabling the source using the mapping api, but what i was interested is the compression of data storage. Thanks and Regards Sri -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ea1e6264-9694-47b0-98d1-992c67bbb63d%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ea1e6264-9694-47b0-98d1-992c67bbb63d%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/acc298a6-bae1-4bb1-ab1c-24ae28a54ff1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
No Node Available
Hi guys, I googled about NoNodeAvailableException, but none answers for my questions until now. I'm getting this error when the ES connections between Server and Client are idle during a long time. I saw the number of connections in 9300 port and there is a huge opened sockets number (something like 700 connections). But if I count every client connection the number should be 32. Every morning I get this Exception once, and then everything works fine, without Exceptions anymore. My client configurations follow: client.transport.sniff: false client.transport.ping_timeout: 30s client.transport.nodes_sampler_interval: 5s Regards. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a1779cc3-dfcc-48bf-9a44-bf55eacc2751%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Marvel 1.2.0 java.lang.IllegalStateException
Is the released? Or it's still in github? Experiencing the same thing... Ran the commands from above... http://pastebin.com/WUTTLgsS On Monday, 9 June 2014 14:44:17 UTC-4, Paweł Krzaczkowski wrote: It works .. thx for a quick fix 2014-06-09 17:48 GMT+02:00 Paweł Krzaczkowski pa...@krzaczkowski.pl javascript:: Im out of office for today so ill test it tomorrow morning and let You know if it works pawel (at) mobile On 9 cze 2014, at 17:40, Boaz Leskes b.le...@gmail.com javascript: wrote: Hi Pawel, We just did a quick minor release to marvel with a fix for this. Would be great if you can give it a try and confirm how it goes. Cheers, Boaz On Friday, June 6, 2014 12:01:52 PM UTC+2, Boaz Leskes wrote: Thx Pawel, Note huge but larger then limit. Working on a fix. On Friday, June 6, 2014 10:10:45 AM UTC+2, Paweł Krzaczkowski wrote: This one is without metadata http://pastebin.com/tmJGA5Kq http://xxx:9200/_cluster/state/version,master_node, nodes,routing_table,blocks/?humanpretty Pawel W dniu piątek, 6 czerwca 2014 09:28:30 UTC+2 użytkownik Boaz Leskes napisał: HI Pawel, I see - your cluster state (nodes + routing only, not meta data), seems to be larger then 16KB when rendered to SMILE, which is quite big - does this make sense? Above 16KB an underlying paging system introduced in the ES 1.x branch kicks in. At that breaks something in Marvel than normally ships very small documents. I'll work on a fix. Can you confirm your cluster state (again, without the metadata) is indeed very large? Cheers, Boaz On Thursday, June 5, 2014 10:56:00 AM UTC+2, Paweł Krzaczkowski wrote: Hi. After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) i'm getting errors like [2014-06-05 10:47:25,346][INFO ][node ] [es-m-3] version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02:52Z] [2014-06-05 10:47:25,347][INFO ][node ] [es-m-3] initializing ... [2014-06-05 10:47:25,367][INFO ][plugins ] [es-m-3] loaded [marvel, analysis-icu], sites [marvel, head, segmentspy, browser, paramedic] [2014-06-05 10:47:28,455][INFO ][node ] [es-m-3] initialized [2014-06-05 10:47:28,456][INFO ][node ] [es-m-3] starting ... [2014-06-05 10:47:28,597][INFO ][transport] [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/ 192.168.0.212:9300]} [2014-06-05 10:47:42,340][INFO ][cluster.service ] [es-m-3] new_master [es-m-3][0H3grrJxTJunU1U6FmkIEg][es-m-3][inet[ 192.168.0.212/192.168.0.212:9300]]{data=false http://192.168.0.212/192.168.0.212:9300%5D%5D%7Bdata=false, master=true}, reason: zen-disco-join (elected_as_master) [2014-06-05 10:47:42,350][INFO ][discovery] [es-m-3] freshmind/0H3grrJxTJunU1U6FmkIEg [2014-06-05 10:47:42,365][INFO ][http ] [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/ 192.168.0.212:9200]} [2014-06-05 10:47:42,368][INFO ][node ] [es-m-3] started [2014-06-05 10:47:44,098][INFO ][cluster.service ] [es-m-3] added {[es-m-1][MHl5Ls-cRXCwc7OC-P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true},}, reason: zen-disco-receive(join from node[[es-m-1][MHl5Ls- cRXCwc7OC-P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true}]) [2014-06-05 10:47:44,401][INFO ][gateway ] [es-m-3] recovered [28] indices into cluster_state [2014-06-05 10:47:48,683][ERROR][marvel.agent ] [es-m-3] exporter [es_exporter] has thrown an exception: java.lang.IllegalStateException: array not available at org.elasticsearch.common.bytes.PagedBytesReference. array(PagedBytesReference.java:289) at org.elasticsearch.marvel.agent.exporter.ESExporter. addXContentRendererToConnection(ESExporter.java:209) at org.elasticsearch.marvel.agent.exporter.ESExporter. exportXContent(ESExporter.java:252) at org.elasticsearch.marvel.agent.exporter.ESExporter. exportEvents(ESExporter.java:161) at org.elasticsearch.marvel.agent.AgentService$ ExportingWorker.exportEvents(AgentService.java:305) at org.elasticsearch.marvel.agent.AgentService$ ExportingWorker.run(AgentService.java:240) at java.lang.Thread.run(Thread.java:745) [2014-06-05 10:47:58,738][ERROR][marvel.agent ] [es-m-3] exporter [es_exporter] has thrown an exception: java.lang.IllegalStateException: array not available at org.elasticsearch.common.bytes.PagedBytesReference. array(PagedBytesReference.java:289) at org.elasticsearch.marvel.agent.exporter.ESExporter. addXContentRendererToConnection(ESExporter.java:209) at org.elasticsearch.marvel.agent.exporter.ESExporter.
Re: Marvel 1.2.0 java.lang.IllegalStateException
Hi, Yes it's been released as Marvel 1.2.1 2014-06-13 16:01 GMT+02:00 John Smith java.dev@gmail.com: Is the released? Or it's still in github? Experiencing the same thing... Ran the commands from above... http://pastebin.com/WUTTLgsS On Monday, 9 June 2014 14:44:17 UTC-4, Paweł Krzaczkowski wrote: It works .. thx for a quick fix 2014-06-09 17:48 GMT+02:00 Paweł Krzaczkowski pa...@krzaczkowski.pl: Im out of office for today so ill test it tomorrow morning and let You know if it works pawel (at) mobile On 9 cze 2014, at 17:40, Boaz Leskes b.le...@gmail.com wrote: Hi Pawel, We just did a quick minor release to marvel with a fix for this. Would be great if you can give it a try and confirm how it goes. Cheers, Boaz On Friday, June 6, 2014 12:01:52 PM UTC+2, Boaz Leskes wrote: Thx Pawel, Note huge but larger then limit. Working on a fix. On Friday, June 6, 2014 10:10:45 AM UTC+2, Paweł Krzaczkowski wrote: This one is without metadata http://pastebin.com/tmJGA5Kq http://xxx:9200/_cluster/state/version,master_node,nodes, routing_table,blocks/?humanpretty Pawel W dniu piątek, 6 czerwca 2014 09:28:30 UTC+2 użytkownik Boaz Leskes napisał: HI Pawel, I see - your cluster state (nodes + routing only, not meta data), seems to be larger then 16KB when rendered to SMILE, which is quite big - does this make sense? Above 16KB an underlying paging system introduced in the ES 1.x branch kicks in. At that breaks something in Marvel than normally ships very small documents. I'll work on a fix. Can you confirm your cluster state (again, without the metadata) is indeed very large? Cheers, Boaz On Thursday, June 5, 2014 10:56:00 AM UTC+2, Paweł Krzaczkowski wrote: Hi. After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) i'm getting errors like [2014-06-05 10:47:25,346][INFO ][node ] [es-m-3] version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02:52Z] [2014-06-05 10:47:25,347][INFO ][node ] [es-m-3] initializing ... [2014-06-05 10:47:25,367][INFO ][plugins ] [es-m-3] loaded [marvel, analysis-icu], sites [marvel, head, segmentspy, browser, paramedic] [2014-06-05 10:47:28,455][INFO ][node ] [es-m-3] initialized [2014-06-05 10:47:28,456][INFO ][node ] [es-m-3] starting ... [2014-06-05 10:47:28,597][INFO ][transport] [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/ 192.168.0.212:9300]} [2014-06-05 10:47:42,340][INFO ][cluster.service ] [es-m-3] new_master [es-m-3][0H3grrJxTJunU1U6FmkIEg][es-m-3][inet[ 192.168.0.212/192.168.0.212:9300]]{data=false http://192.168.0.212/192.168.0.212:9300%5D%5D%7Bdata=false, master=true}, reason: zen-disco-join (elected_as_master) [2014-06-05 10:47:42,350][INFO ][discovery] [es-m-3] freshmind/0H3grrJxTJunU1U6FmkIEg [2014-06-05 10:47:42,365][INFO ][http ] [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/ 192.168.0.212:9200]} [2014-06-05 10:47:42,368][INFO ][node ] [es-m-3] started [2014-06-05 10:47:44,098][INFO ][cluster.service ] [es-m-3] added {[es-m-1][MHl5Ls-cRXCwc7OC-P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true},}, reason: zen-disco-receive(join from node[[es-m-1][MHl5Ls-cRXCwc7OC -P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true}]) [2014-06-05 10:47:44,401][INFO ][gateway ] [es-m-3] recovered [28] indices into cluster_state [2014-06-05 10:47:48,683][ERROR][marvel.agent ] [es-m-3] exporter [es_exporter] has thrown an exception: java.lang.IllegalStateException: array not available at org.elasticsearch.common.bytes.PagedBytesReference.array( PagedBytesReference.java:289) at org.elasticsearch.marvel.agent.exporter.ESExporter. addXContentRendererToConnection(ESExporter.java:209) at org.elasticsearch.marvel.agent.exporter.ESExporter. exportXContent(ESExporter.java:252) at org.elasticsearch.marvel.agent.exporter.ESExporter. exportEvents(ESExporter.java:161) at org.elasticsearch.marvel.agent.AgentService$ ExportingWorker.exportEvents(AgentService.java:305) at org.elasticsearch.marvel.agent.AgentService$ ExportingWorker.run(AgentService.java:240) at java.lang.Thread.run(Thread.java:745) [2014-06-05 10:47:58,738][ERROR][marvel.agent ] [es-m-3] exporter [es_exporter] has thrown an exception: java.lang.IllegalStateException: array not available at org.elasticsearch.common.bytes.PagedBytesReference.array( PagedBytesReference.java:289) at org.elasticsearch.marvel.agent.exporter.ESExporter. addXContentRendererToConnection(ESExporter.java:209) at
Re: Marvel 1.2.0 java.lang.IllegalStateException
Ok works thanks On Friday, 13 June 2014 10:02:06 UTC-4, Paweł Krzaczkowski wrote: Hi, Yes it's been released as Marvel 1.2.1 2014-06-13 16:01 GMT+02:00 John Smith java.d...@gmail.com javascript:: Is the released? Or it's still in github? Experiencing the same thing... Ran the commands from above... http://pastebin.com/WUTTLgsS On Monday, 9 June 2014 14:44:17 UTC-4, Paweł Krzaczkowski wrote: It works .. thx for a quick fix 2014-06-09 17:48 GMT+02:00 Paweł Krzaczkowski pa...@krzaczkowski.pl: Im out of office for today so ill test it tomorrow morning and let You know if it works pawel (at) mobile On 9 cze 2014, at 17:40, Boaz Leskes b.le...@gmail.com wrote: Hi Pawel, We just did a quick minor release to marvel with a fix for this. Would be great if you can give it a try and confirm how it goes. Cheers, Boaz On Friday, June 6, 2014 12:01:52 PM UTC+2, Boaz Leskes wrote: Thx Pawel, Note huge but larger then limit. Working on a fix. On Friday, June 6, 2014 10:10:45 AM UTC+2, Paweł Krzaczkowski wrote: This one is without metadata http://pastebin.com/tmJGA5Kq http://xxx:9200/_cluster/state/version,master_node,nodes, routing_table,blocks/?humanpretty Pawel W dniu piątek, 6 czerwca 2014 09:28:30 UTC+2 użytkownik Boaz Leskes napisał: HI Pawel, I see - your cluster state (nodes + routing only, not meta data), seems to be larger then 16KB when rendered to SMILE, which is quite big - does this make sense? Above 16KB an underlying paging system introduced in the ES 1.x branch kicks in. At that breaks something in Marvel than normally ships very small documents. I'll work on a fix. Can you confirm your cluster state (again, without the metadata) is indeed very large? Cheers, Boaz On Thursday, June 5, 2014 10:56:00 AM UTC+2, Paweł Krzaczkowski wrote: Hi. After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) i'm getting errors like [2014-06-05 10:47:25,346][INFO ][node ] [es-m-3] version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02 :52Z] [2014-06-05 10:47:25,347][INFO ][node ] [es-m-3] initializing ... [2014-06-05 10:47:25,367][INFO ][plugins ] [es-m-3] loaded [marvel, analysis-icu], sites [marvel, head, segmentspy, browser, paramedic] [2014-06-05 10:47:28,455][INFO ][node ] [es-m-3] initialized [2014-06-05 10:47:28,456][INFO ][node ] [es-m-3] starting ... [2014-06-05 10:47:28,597][INFO ][transport] [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.0.212:9300]} [2014-06-05 10:47:42,340][INFO ][cluster.service ] [es-m-3] new_master [es-m-3][0H3grrJxTJunU1U6FmkIEg][es-m-3][inet[ 192.168.0.212/192.168.0.212:9300]]{data=false http://192.168.0.212/192.168.0.212:9300%5D%5D%7Bdata=false, master=true}, reason: zen-disco-join (elected_as_master) [2014-06-05 10:47:42,350][INFO ][discovery] [es-m-3] freshmind/0H3grrJxTJunU1U6FmkIEg [2014-06-05 10:47:42,365][INFO ][http ] [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.0.212:9200]} [2014-06-05 10:47:42,368][INFO ][node ] [es-m-3] started [2014-06-05 10:47:44,098][INFO ][cluster.service ] [es-m-3] added {[es-m-1][MHl5Ls-cRXCwc7OC-P0J 5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true},}, reason: zen-disco-receive(join from node[[es-m-1][MHl5Ls-cRXCwc7OC -P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true}]) [2014-06-05 10:47:44,401][INFO ][gateway ] [es-m-3] recovered [28] indices into cluster_state [2014-06-05 10:47:48,683][ERROR][marvel.agent ] [es-m-3] exporter [es_exporter] has thrown an exception: java.lang.IllegalStateException: array not available at org.elasticsearch.common.bytes.PagedBytesReference. array(PagedBytesReference.java:289) at org.elasticsearch.marvel.agent.exporter.ESExporter. addXContentRendererToConnection(ESExporter.java:209) at org.elasticsearch.marvel.agent.exporter.ESExporter. exportXContent(ESExporter.java:252) at org.elasticsearch.marvel.agent.exporter.ESExporter. exportEvents(ESExporter.java:161) at org.elasticsearch.marvel.agent.AgentService$ ExportingWorker.exportEvents(AgentService.java:305) at org.elasticsearch.marvel.agent.AgentService$ ExportingWorker.run(AgentService.java:240) at java.lang.Thread.run(Thread.java:745) [2014-06-05 10:47:58,738][ERROR][marvel.agent ] [es-m-3] exporter [es_exporter] has thrown an exception: java.lang.IllegalStateException: array not available at org.elasticsearch.common.bytes.PagedBytesReference. array(PagedBytesReference.java:289) at
Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes
I haven't seen it asked yet; what is feeding data into your elasticsearch? Depending on what you're doing to get it there, a large document size could easily bottleneck some feeding mechanisms. It's also noteable that some green spinning disks top out in the realm of 72MB/s. It might be useful to make sure that your feeding mechanism can handle more than 500 TPS. -- The information transmitted in this email is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please contact the sender and permanently delete the email from any computer. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/372f7ff6-9245-4bb6-ae87-0eacedbb724e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Changing Kibana-int based on context
I am a newbie to Computer science in general and at present I am working on a project which involves Elasticsearch, logstash, and Kibana and we are using this to build up a centralized Logging system. In kibana config.js , there is a parameter kibana_index whose default value is set to Kibana-int. Is there a way possible to change the value of Kibana-index based on the context? What I could understand from my research is that kibana-int is the index which stores all the dashboards. When I say context, what I mean is if I have multiple projects in an organization, the dropdown on kibana dashboard page should show the dashboards only under a particular project when I give that project's name as the context in my url. So people working in a project get to see only the ones in their project. The only way I could find is to change the kibana-index value based on the project say something like kibana-projA. So it shows all the dashboards under this particular index. But I couldnt find a way as to how to do it. Could you please help me out. Any help would be appreciated. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8ff596f-7813-48b6-9fab-65d075aaebcd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Percollation limits
Hi I wanted to ask those who use percollation: how many queries are you percollating? I need to set up some equivalent of percollation for about 100k queries. With some filtering probably up to 10k would actually had to be checked for each new document. Is the idea of using ES percollations for that insane? Thanks Maciej Dziardziel -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf587216-9630-4eed-b30f-7f6a869778ab%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Kibana 3 and changing the default field from _all to message
I have this typical document being indexed by logstash. The following shows the document in rubydebug mode and not as JSON, but when converted to JSON and indexed the field names and values are the same (in other words, the syntax below isn't one-line JSON but it's clearer to read): { message = 2014-06-13 16:15:18,431 foo=1 bar=3 text=\quoted strings work\ assist=true, @version = 1, @timestamp = 2014-06-13T16:15:18.431Z, host = blacksheep, foo = 1, bar = 3, text = quoted strings work, assist = true } In preparation for the best possible performance, I disabled the _all field from all my logstash-* indices. It isn't needed, as the message field contains all of the original message's text anyway. And the _all field wastes time during indexing and space on disk. But all of the answers to the question How can I configure Kibana to use the message field as the default and not the _all field seem to apply to Kibana 1 and 2, the ruby versions. There is no RubyConfig.rb file in Kibana 3. And I cannot find any reference to the _all field, only to all indices (which I broke nicely when fumbling around; it applied only to indices as I quickly discovered). Telling people to query for message:work instead of just work does not endear me to them. Is there some way to configure Kibana 3 to change the default field in its Lucene query to message instead of _all? Thank you in advance! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7b6d6d1-5690-496e-bdb1-1ee33b027b12%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Non-Uniform Drive Space Across Nodes
OP here. My numbers on the disk space were not an actual observation of current sizes. It was more of a hypothetical of what can I expect ES to do if I only had three servers and that was the starting disk space available in each. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ca665ae3-393e-4f79-8a31-60130751938f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Configuring YML files Location
For example, I keep my Elasticsearch configurations for use with the ELK stack within this directory: */opt/config/elk/current* So my start-up script calls the elasticsearch command as follows: $ES_HOME/elasticsearch -d ... -Des.path.conf=*/opt/config/elk/current* ... Hope this helps! Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc66b0cc-7505-4d43-87c8-d6ce1d87851b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Cassandra with JDBC river plugin
Ok. Thanks it seems I have to make the Cassandra river work. On 13 Jun 2014 16:34, joergpra...@gmail.com joergpra...@gmail.com wrote: The Cassandra Java Driver is not a JDBC driver. Jörg On Fri, Jun 13, 2014 at 11:11 AM, Abhishek Mukherjee 4271...@gmail.com wrote: Checking the Elasticsearch log files I found this. No suitable driver found for jdbc:cassandra:// 192.168.1.103:9160/transactionlogdb at java.sql.DriverManager.getConnection(DriverManager.java:689) at java.sql.DriverManager.getConnection(DriverManager.java:247) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:133) at org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:271) However I have placed all the necessary jar files for the driver in $ES_HOME/plugins/jdbc. Please advice. Kind Regards Abhishek On Friday, June 13, 2014 1:43:45 PM UTC+5:30, Abhishek Mukherjee wrote: Hi Everyone, I am trying to move data from Cassandra to Elasticsearch. Initially I tried the cassandra-river at https://github.com/eBay/cassandra-river. However I got timed out error which I suspect was originating from the Hector API. I posted a question on ths thread https://groups.google.com/ forum/#!searchin/elasticsearch/cassandra/elasticsearch/4oDbkqK3GVA/ W9WLK4SS2MEJ. Moving on I thought of using the JDBC-river at https://github.com/jprante/elasticsearch-river-jdbc with a java driver for cassandra. I followed the mysql example and modified it for cassandra. I created the river using as follows: curl -XPUT '192.168.1.103:9200/_river/my_jdbc_river/_meta' -d '{ type : jdbc, jdbc : { url : jdbc:cassandra://192.168.1.105:9160/transactionlogdb, cql : select * from logs } }' {_index:_river,_type:my_jdbc_river,_id:_meta, _version:1,created:true} However I don't find any documents being created on the jdbc index. Am I missing something? Any help or tips is very much appreciated. Thanks is advance. Kind Regards, Abhishek Mukherjee -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/iU_JRwxl6ZI/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGf1GjB8MSefx6ZC1OYD0b6Xf%3DKX%2BpDHcYe7cvZVmeyJg%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGf1GjB8MSefx6ZC1OYD0b6Xf%3DKX%2BpDHcYe7cvZVmeyJg%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMjqjp4RYqhCP_BwGcA08WPtsc29AFj8UB%2Boi2tyTpY%2BPZouMg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: exclude some documents (and category filter combination) for some queries
Currently not possible. Elasticsearch will return all the nested documents as long as one of the nested documents satisfies the query. https://github.com/elasticsearch/elasticsearch/issues/3022 The issue is my personal #1 feature requested. Frustrating considering there has been a working implementation since version 0.90.5. 1.0, 1.1, 1.2 and still nothing. -- Ivan On Thu, Jun 12, 2014 at 2:17 PM, Srinivasan Ramaswamy ursva...@gmail.com wrote: any thoughts anyone ? On Wednesday, June 11, 2014 11:15:18 PM UTC-7, Srinivasan Ramaswamy wrote: I would like to exclude some documents belonging to certain category from the results only for certain search queries. I have a ES client layer where i am thinking of implementing this logic as a not filter depending on the search query. Let me give an example. sample index designId: 100 tags: [dog, cute] caption : cute dog in the garden products : [ { productId: 200, category: 1}, {productId: 201, category: 2} ] designId: 101 tags: [brown, dog] caption : little brown dog products : [ {productId: 202, category: 3} ] designId: 102 tags: [black, dog] caption : little black dog products : [ { productId: 202, category: 4}, {productId: 203, category: 5} ] products is a nested field inside each design. I would like to write a query to get all matches for dog, (not for other keywords) but filter out few categories from the result. As ES returns the whole nested document even if only one nested document matches the query, my expected result is designId: 100 tags: [dog, cute] caption : cute dog in the garden products : [ { productId: 200, category: 1}, {productId: 201, category: 2} ] designId: 102 tags: [black, dog] caption : little black dog products : [ { productId: 202, category: 4}, {productId: 203, category: 5} ] Here is the query i tried but it doesn't work. Can anyone help me point out the mistake ? GET /_search/ { query: { filtered: { filter: { and: [ { not: { term: { category: 1 } } }, { not: { term: { category: 3 } } } ] }, query: { multi_match: { query: dog, fields: [ tags, caption ], minimum_should_match: 50% } } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/45fbf85d-4d29-4222-a72a-bf0a04d9a26d%40googlegroups.com . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAfwARsZ7uGKkBf%2BH10jhrdw4dr5nxvHEK_FDUwQv%2BpQw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: issue of elasticsearch-hadoop-2.0.0 with Hive (cloudera and hortonworks), helps are needed
Hi, Sorry for the delayed response, travel and other things got in the way. I have tried replicating the issue on my end and couldn't; see below: On 6/8/14 8:03 PM, elitem way wrote: I am learning the elasticsearch-hadoop. I have a few issues that I do not understand. I am using ES 1.12 on Windows, elasticsearch-hadoop-2.0.0 and cloudera-quickstart-vm-5.0.0-0-vmware sandbox with Hive. 1. I loaded only 6 rows to ES index car/transactions. Why did Hive return 14 rows instead? See below. 2. select count(*) from cars2 failed with code 2. Group by, sum also failed. Did I miss anything. The similar query are successful when using sample_07 and sample_08 tables that come with Hive. 3. elasticsearch-hadoop-2.0.0 does seem to work with jetty - the authentication plugin. I got errors when I enable jetty and set 'es.nodes' = 'superuser:admin@192.168.128.1' 4. I could not pipe data from Hive to ElasticSearch either. *--ISSUE 1*: --load data to ES POST: http://localhost:9200/cars/transactions/_bulk { index: {}} { price : 3, color : green, make : ford, sold : 2014-05-18 } { index: {}} { price : 15000, color : blue, make : toyota, sold : 2014-07-02 } { index: {}} { price : 12000, color : green, make : toyota, sold : 2014-08-19 } { index: {}} { price : 2, color : red, make : honda, sold : 2014-11-05 } { index: {}} { price : 8, color : red, make : bmw, sold : 2014-01-01 } { index: {}} { price : 25000, color : blue, make : ford, sold : 2014-02-12 } CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold TIMESTAMP) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'cars/transactions', 'es.nodes' = '192.168.128.1', 'es.port'='9200'); HIVE: select * from cars2; 14 rows returned. color make price sold 0 red honda 2 2014-11-05 00:00:00.0 1 red honda 1 2014-10-28 00:00:00.0 2 green ford 3 2014-05-18 00:00:00.0 3 green toyota 12000 2014-08-19 00:00:00.0 4 blue ford 25000 2014-02-12 00:00:00.0 5 blue toyota 15000 2014-07-02 00:00:00.0 6 red bmw 8 2014-01-01 00:00:00.0 7 red honda 1 2014-10-28 00:00:00.0 8 blue toyota 15000 2014-07-02 00:00:00.0 9 red honda 2 2014-11-05 00:00:00.0 10 green ford 3 2014-05-18 00:00:00.0 11 green toyota 12000 2014-08-19 00:00:00.0 12 red honda 2 2014-11-05 00:00:00.0 13 red honda 2 2014-11-05 00:00:00.0 14 red bmw 8 2014-01-01 00:00:00.0 It looks like you are adding data to localhost:9200 but querying on 192.168.128.1:9200 - most likely they are different, hence the different data set. To double check, do a query/count through curl on ES and then check the data through Hive - that's what we do in our tests. *ISSUE2:* HIVE: select count(*) from cars2; Your query has the following error(s): Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask Again since you are querying a different host it's hard to tell what's the issue. count(*) works in our tests but I've seen cases where count fails when dealing the newly introduced types (like timestamp). You can use count(1) as an alternative which should work just fine. *--ISSUE 4:* CREATE EXTERNAL TABLE test1 ( description STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource' = 'test1'); INSERT OVERWRITE TABLE test1 select description from sample_07; Your query has the following error(s): Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask That is because you have an invalid table definition; the resource needs to point to a index/type not just an index - if you look deep into the Hive exception, you should be able to see the actual validation message. Since Hive executes things lazily and on the server side, there's no other way of reporting the error to the user... Hope this helps, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com?utm_medium=emailutm_source=footer. For more options, visit https://groups.google.com/d/optout. -- Costin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/539B3BFF.509%40gmail.com. For more
Re: Kibana 3 and changing the default field from _all to message
Ok, it's not a Kibana issue, but my Elasticsearch configuration issue. I could fix it in the elasticsearch.yml file, but I believe it's much safer to fix it in my less-likely-to-be-altered start-up script wrapper. So now when I start ES via the bin/elasticsearch script, but only on behalf of the ELK stack, I add the following option to the command line: -Des.index.query.default_field=message And now, my default field for a Kibana (Lucene) query is message and not _all. And _all is well (pun intended!). Brian -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84b63fe8-523b-43f4-8522-6b8d392ff63c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Securing Data in Elasticsearch
ES nodes would be locked down and accessible only to authorized users on the OS level; it's the ability to delete and update indices/documents remotely that's worrisome in this case. Disabling HTTP REST API completely is not possible since it's required by Kibana (running behind a reverse proxy), although I suppose I could restrict the ES node to only accept traffic from Logstash on port 9300 and from the reverse proxy on port 9200, would this provide sufficient protection? Thanks On Thursday, June 12, 2014 6:44:33 PM UTC+3, Jörg Prante wrote: If you want ES-level security, you should first reduce attack vectors, by closing down all the open ports and resources that are not necessary. One step would be to disable HTTP REST API completely (port 9200) and run Logstash Elasticsearch output only http://logstash.net/docs/1.4.1/outputs/elasticsearch As a consequence, you could only kill the ES process on a node, or send Java API commands. It is not possible to block Java API commands over port 9300, this is how nodes talk to each other. You could imagine a self-written tool for administering your cluster that uses the Java API only (from a J2EE web app for example) On the node on OS level, you would have to protect the OS user of ES node is running under from being accessed by third party users. Jörg On Thu, Jun 12, 2014 at 5:30 PM, Harvii Dent harvi...@gmail.com javascript: wrote: ES settings alone would be great, are there other options that I could have missed? right now the main priority is preventing document updates/deletes (and index deletes) via the ES rest api. Thanks On Thursday, June 12, 2014 6:21:36 PM UTC+3, Jörg Prante wrote: There are a lot of methods to tamper with ES files, and physically, everything is possible to modify in files as long as your operating system permits more than something like append-only mode for ES files (not that I know this would work) So it depends on your requirements about the security level you want to reach, if ES settings alone can help you or if you need more (paranoid) configurations. Jörg On Thu, Jun 12, 2014 at 4:48 PM, Harvii Dent harvi...@gmail.com wrote: Hello, I'm planning to use Elasticsearch with Logstash for logs management and search, however, one thing I'm unable to find an answer for is making sure that the data cannot be modified once it reaches Elasticsearch. action.destructive_requires_name prevents deleting all indices at once, but they can still be deleted. Are there any options to prevent deleting indices altogether? And on the document level, is it possible to disable 'delete' *AND* 'update' operations without setting the entire index as read-only (ie. 'index.blocks.read_only')? Lastly, does setting 'index.blocks.read_only' ensure that the index files on disk are not changed (so they can be monitored using a file integrity monitoring solution)? as many regulatory and compliance bodies have requirements for ensuring logs integrity. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/dfc73db4-18ac-405e-8929-68be32b01a6c% 40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/dfc73db4-18ac-405e-8929-68be32b01a6c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com javascript:. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/190a707b-9edf-4128-9740-79d59f0bc209%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/190a707b-9edf-4128-9740-79d59f0bc209%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9339cfd0-9300-496e-bc00-4179725e02db%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch-php auth error
Hi, I've been using the php client successfully with a remote server, and I've set up a new server and run into auth problems using the PHP client library. $clientParams['connectionParams']['auth'] = array( 'user', 'pw', 'Basic' ); My issue is now I get back a 401 Authentication Required every time I try to hit the endpoints using the PHP client but I've gone on the chrome extension Postman to try sending some basic auth requests using the user/pw and the server responds correctly. Any ideas what might cause this or how to go about debugging? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8fc0743b-d3a9-4b4c-9484-deb647707afd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: issue of elasticsearch-hadoop-2.0.0 with Hive (cloudera and hortonworks), helps are needed
Thank you for the response. The localhost and 192.168.128.1 are the actually the same ES host. I installed ES cloudera vm on xp. I will try your suggestion though and report back. I will try the table without timestamp column. Sent from my iPhone On Jun 13, 2014, at 1:59 PM, Costin Leau costin.l...@gmail.com wrote: Hi, Sorry for the delayed response, travel and other things got in the way. I have tried replicating the issue on my end and couldn't; see below: On 6/8/14 8:03 PM, elitem way wrote: I am learning the elasticsearch-hadoop. I have a few issues that I do not understand. I am using ES 1.12 on Windows, elasticsearch-hadoop-2.0.0 and cloudera-quickstart-vm-5.0.0-0-vmware sandbox with Hive. 1. I loaded only 6 rows to ES index car/transactions. Why did Hive return 14 rows instead? See below. 2. select count(*) from cars2 failed with code 2. Group by, sum also failed. Did I miss anything. The similar query are successful when using sample_07 and sample_08 tables that come with Hive. 3. elasticsearch-hadoop-2.0.0 does seem to work with jetty - the authentication plugin. I got errors when I enable jetty and set 'es.nodes' = 'superuser:admin@192.168.128.1' 4. I could not pipe data from Hive to ElasticSearch either. *--ISSUE 1*: --load data to ES POST: http://localhost:9200/cars/transactions/_bulk { index: {}} { price : 3, color : green, make : ford, sold : 2014-05-18 } { index: {}} { price : 15000, color : blue, make : toyota, sold : 2014-07-02 } { index: {}} { price : 12000, color : green, make : toyota, sold : 2014-08-19 } { index: {}} { price : 2, color : red, make : honda, sold : 2014-11-05 } { index: {}} { price : 8, color : red, make : bmw, sold : 2014-01-01 } { index: {}} { price : 25000, color : blue, make : ford, sold : 2014-02-12 } CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold TIMESTAMP) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'cars/transactions', 'es.nodes' = '192.168.128.1', 'es.port'='9200'); HIVE: select * from cars2; 14 rows returned. color make price sold 0 red honda 2 2014-11-05 00:00:00.0 1 red honda 1 2014-10-28 00:00:00.0 2 green ford 3 2014-05-18 00:00:00.0 3 green toyota 12000 2014-08-19 00:00:00.0 4 blue ford 25000 2014-02-12 00:00:00.0 5 blue toyota 15000 2014-07-02 00:00:00.0 6 red bmw 8 2014-01-01 00:00:00.0 7 red honda 1 2014-10-28 00:00:00.0 8 blue toyota 15000 2014-07-02 00:00:00.0 9 red honda 2 2014-11-05 00:00:00.0 10 green ford 3 2014-05-18 00:00:00.0 11 green toyota 12000 2014-08-19 00:00:00.0 12 red honda 2 2014-11-05 00:00:00.0 13 red honda 2 2014-11-05 00:00:00.0 14 red bmw 8 2014-01-01 00:00:00.0 It looks like you are adding data to localhost:9200 but querying on 192.168.128.1:9200 - most likely they are different, hence the different data set. To double check, do a query/count through curl on ES and then check the data through Hive - that's what we do in our tests. *ISSUE2:* HIVE: select count(*) from cars2; Your query has the following error(s): Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask Again since you are querying a different host it's hard to tell what's the issue. count(*) works in our tests but I've seen cases where count fails when dealing the newly introduced types (like timestamp). You can use count(1) as an alternative which should work just fine. *--ISSUE 4:* CREATE EXTERNAL TABLE test1 ( description STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource' = 'test1'); INSERT OVERWRITE TABLE test1 select description from sample_07; Your query has the following error(s): Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask That is because you have an invalid table definition; the resource needs to point to a index/type not just an index - if you look deep into the Hive exception, you should be able to see the actual validation message. Since Hive executes things lazily and on the server side, there's no other way of reporting the error to the user... Hope this helps, -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com mailto:elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com?utm_medium=emailutm_source=footer.
Re: ingest performance degrades sharply along with the documents having more fileds
Hi, Mark: We are doing single document ingestion. We did a performance comparison between Solr and Elastic Search (ES). The performance for ES degrades dramatically when we increase the metadata fields where Solr performance remains the same. The performance is done in very small data set (ie. 10k documents, the index size is only 75mb). The machine is a high spec machine with 48GB memory. You can see ES performance drop 50% even when the machine have plenty memory. ES consumes all the machine memory when metadata field increased to 100k. This behavior seems abnormal since the data is really tiny. We also tried with larger data set (ie. 100k and 1Mil documents), ES throw OOW for scenario 2 for 1 Mil doc scenario. We want to know whether this is a bug in ES and/or is there any workaround (config step) we can use to eliminate the performance degradation. Currently ES performance does not meet the customer requirement so we want to see if there is anyway we can bring ES performance to the same level as Solr. Below is the configuration setting and benchmark results for 10k document set. scenario 0 means there are 1000 different metadata fields in the system. scenario 1 means there are 10k different metatdata fields in the system. scenario 2 means there are 100k different metadata fields in the system. scenario 3 means there are 1M different metadata fields in the system. - disable hard-commit soft commit + use a *client* to do commit (ES Solr) every 10 second - ES: flush, refresh are disabled - Solr: autoSoftCommit are disabled - monitor load on the system (cpu, memory, etc) or the ingestion speed change over time - monitor the ingestion speed (is there any degradation over time?) - new ES config:new_ES_config.sh https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_config.sh; new ingestion: new_ES_ingest_threads.pl https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_ingest_threads.pl - new Solr ingestion: new_Solr_ingest_threads.pl https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_Solr_ingest_threads.pl - flush interval: 10s Number of different meta data fieldESSolrScenario 0: 100012secs - 833docs/sec CPU: 30.24% Heap: 1.08G time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1 index size: 36M iowait: 0.02%13 secs - 769 docs/sec CPU: 28.85% Heap: 9.39G time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs - 345docs/sec CPU: 40.83% Heap: 5.74G time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1 iowait: 0.02% Index Size: 36M12 secs - 833 docs/sec CPU: 28.62% Heap: 9.88G time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2Scenario 2: 100k17 mins 44 secs - 9.4docs/sec CPU: 54.73% Heap: 47.99G time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40 iowait: 0.02% Index Size: 75M13 secs - 769 docs/sec CPU: 29.43% Heap: 9.84G time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8 secs - 0.9 docs/sec CPU: 40.47% Heap: 47.99G time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 159415 secs - 666.7 docs/sec CPU: 45.10% Heap: 9.64G time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2 Thanks! Cindy -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
HIVE-Elasticsearch [mapr-elasticsearch] write to elasticsearch issue
Hi , I am trying to integrate elasticsearch with a mapr hadoop cluster. I am using the hive-elasticsearch integration document. I am able to read data from the elasticsearch node. However I am not able to write data into the elasticsearch node which is my primary requirement. Request to kindly guide me . I always get the following errors:- 2014-06-13 14:15:45,814 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS maprfs:/user/hive/warehouse/dev.db/_tmp.shankar/02_0*2014-06-13 14:15:45,947 FATAL org.apache.hadoop.hive.ql.exec.mr.ExecMapper: java.lang.NoSuchMethodError: org.codehaus.jackson.JsonGenerator.writeUTF8String([BII)V at org.elasticsearch.hadoop.serializ*ation.json.JacksonJsonGenerator.writeUTF8String(JacksonJsonGenerator.java:123) at org.elasticsearch.hadoop.mr.WritableValueWriter.write(WritableValueWriter.java:47) at org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:83) at org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:38) at org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:69) at org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:111) at org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:55) at org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:41) at org.elasticsearch.hadoop.serialization.builder.ContentBuilder.value(ContentBuilder.java:258) at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.doWriteObject(TemplatedBulk.java:92) at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:79) at org.elasticsearch.hadoop.hive.EsSerDe.serialize(EsSerDe.java:128) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) at org.apache.hadoop.mapred.Child$4.run(Child.java:282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117) at org.apache.hadoop.mapred.Child.main(Child.java:271) 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 finished. closing... 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing... 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing... 2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 finished. closing... 2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 Close done 2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done 2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 Close done 2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processed 0 rows: used memory = 9514320 2014-06-13 14:15:45,992 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2014-06-13 14:15:46,024 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: java.lang.NoSuchMethodError: org.codehaus.jackson.JsonGenerator.writeUTF8String([BII)V at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348) at org.apache.hadoop.mapred.Child$4.run(Child.java:282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at
Re: Securing Data in Elasticsearch
You should start HTTP only on localhost then and run Kibana on a selected number of nodes only. There are some authentication solutions for Kibana. I am not able to find security features like audit trails or preventing writes in Kibana/ES so you have to take care. Assessing Kibana for attacks over the web (intrusion detection, executing commands etc) is useful, I don't know if anyone has tried such a thing, but it is a very complex task. Because this variant is tedious and maybe not successful, I would opt for a different approach. Keep a checksummed copy of an index at a safe restricted place on a private ES cluster (or burn it even to optical media) and rsync a copy of it to an unsafe place, to another public ES cluster where Kibana runs. Checksum verification can prove if index is modified in the meantime at the public place. Jörg On Fri, Jun 13, 2014 at 8:18 PM, Harvii Dent harviid...@gmail.com wrote: ES nodes would be locked down and accessible only to authorized users on the OS level; it's the ability to delete and update indices/documents remotely that's worrisome in this case. Disabling HTTP REST API completely is not possible since it's required by Kibana (running behind a reverse proxy), although I suppose I could restrict the ES node to only accept traffic from Logstash on port 9300 and from the reverse proxy on port 9200, would this provide sufficient protection? Thanks On Thursday, June 12, 2014 6:44:33 PM UTC+3, Jörg Prante wrote: If you want ES-level security, you should first reduce attack vectors, by closing down all the open ports and resources that are not necessary. One step would be to disable HTTP REST API completely (port 9200) and run Logstash Elasticsearch output only http://logstash.net/docs/1.4. 1/outputs/elasticsearch As a consequence, you could only kill the ES process on a node, or send Java API commands. It is not possible to block Java API commands over port 9300, this is how nodes talk to each other. You could imagine a self-written tool for administering your cluster that uses the Java API only (from a J2EE web app for example) On the node on OS level, you would have to protect the OS user of ES node is running under from being accessed by third party users. Jörg On Thu, Jun 12, 2014 at 5:30 PM, Harvii Dent harvi...@gmail.com wrote: ES settings alone would be great, are there other options that I could have missed? right now the main priority is preventing document updates/deletes (and index deletes) via the ES rest api. Thanks On Thursday, June 12, 2014 6:21:36 PM UTC+3, Jörg Prante wrote: There are a lot of methods to tamper with ES files, and physically, everything is possible to modify in files as long as your operating system permits more than something like append-only mode for ES files (not that I know this would work) So it depends on your requirements about the security level you want to reach, if ES settings alone can help you or if you need more (paranoid) configurations. Jörg On Thu, Jun 12, 2014 at 4:48 PM, Harvii Dent harvi...@gmail.com wrote: Hello, I'm planning to use Elasticsearch with Logstash for logs management and search, however, one thing I'm unable to find an answer for is making sure that the data cannot be modified once it reaches Elasticsearch. action.destructive_requires_name prevents deleting all indices at once, but they can still be deleted. Are there any options to prevent deleting indices altogether? And on the document level, is it possible to disable 'delete' *AND* 'update' operations without setting the entire index as read-only (ie. 'index.blocks.read_only')? Lastly, does setting 'index.blocks.read_only' ensure that the index files on disk are not changed (so they can be monitored using a file integrity monitoring solution)? as many regulatory and compliance bodies have requirements for ensuring logs integrity. Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/dfc73db4-18ac-405e-8929-68be32b01a6c%40goo glegroups.com https://groups.google.com/d/msgid/elasticsearch/dfc73db4-18ac-405e-8929-68be32b01a6c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/elasticsearch/190a707b-9edf-4128-9740-79d59f0bc209% 40googlegroups.com
Linear Scaling with ES
Hi, We have been spending considerable amount of time now just to figure out if we can get linear scaling in ES by increasing number of nodes or shards or some other parameters. We did so many experiments, changing shards, changing nodes, changing replica, etc but looks to me with everything we were hitting a limit. I know this is a very broad question i'm asking, but does anyone know if it is even possible? Is there any formula or magic mantra to achieve this. Thank a lot in advance if someone can answer this. It can save me some time. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0122c86-f8d8-4e5a-bf07-fa225c2b787c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
index.cache.filter.type
I'm toying with the effects of different settings and noticed that setting `index.cache.filter.type: none` works fine, but setting `index.cache.filter.type: soft` or `index.cache.filter.type: weak` gives me stack traces. Am I doing it wrong? The docs mention soft, weak and resident being the type's available. I'm running ES v1.1.0 org.elasticsearch.indices.IndexCreationException: [centrallogging_awseast-2014-06-13] failed to create index at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:300) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:307) at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:179) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:424) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [index.cache.filter.type] with value [soft] at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:448) at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:436) at org.elasticsearch.index.cache.filter.FilterCacheModule.configure(FilterCacheModule.java:44) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) at org.elasticsearch.index.cache.IndexCacheModule.configure(IndexCacheModule.java:41) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204) at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85) at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130) at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99) at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131) at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69) at org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:298) ... 7 more Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.cache.filter.soft.SoftFilterCache at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:446) ... 19 more -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHU4sP96QM3Rf5SF7Pd7tJvOib4dZyW9yEHJ_XgD6ZuApvbUNw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Showing stats from delete operation
Would it be possible to add some stats to the response from a DeleteByQuery giving information on how my objects were deleted? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5afa5f13-88e9-481c-ba13-8f6343fd7023%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Linear Scaling with ES
The answer is - it depends. If you can provide a bit more detail on what you've done, your setup etc, maybe someone can provide more assistance. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 14 June 2014 07:48, pranav amin parulpate...@gmail.com wrote: Hi, We have been spending considerable amount of time now just to figure out if we can get linear scaling in ES by increasing number of nodes or shards or some other parameters. We did so many experiments, changing shards, changing nodes, changing replica, etc but looks to me with everything we were hitting a limit. I know this is a very broad question i'm asking, but does anyone know if it is even possible? Is there any formula or magic mantra to achieve this. Thank a lot in advance if someone can answer this. It can save me some time. Thanks Pranav. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0122c86-f8d8-4e5a-bf07-fa225c2b787c%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/d0122c86-f8d8-4e5a-bf07-fa225c2b787c%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YrsEVU9CPjj_vKWA4uExwPxjr6EvkyxrnwfG2S3TudiQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Showing stats from delete operation
You will need to raise a github request for this. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 14 June 2014 08:41, jb...@locu.com wrote: Would it be possible to add some stats to the response from a DeleteByQuery giving information on how my objects were deleted? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5afa5f13-88e9-481c-ba13-8f6343fd7023%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5afa5f13-88e9-481c-ba13-8f6343fd7023%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y79gMUxTDZaCisMWQJ6wMgZGci82wuxB2y-y8Ofs%3DmzA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: RepositoryMissingException
Well, that was it. I copied the sample PUT from the elasticsearch web site, which of course uses curl, and did not take out the -d. Definitely helps to have another pair of eyes. I was looking at that all day and didn't see the -d. Thanks for your help. Shawn On Friday, June 13, 2014 5:35:45 AM UTC-6, Shawn Mullen wrote: good question. that is what is being returned when I make the call. but your question gave me an idea as to what the problem is. thanks. On Jun 12, 2014 11:32 PM, David Pilato wrote: What is this -d in statlogs -d? -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 13 juin 2014 à 03:58, Shawn Mullen I have an ElasticSearch instance running on my local machine. I installed the S3 plugin so I can do backup and restore operation to/from S3. I tried to follow the documentation on how to set this up. I was able to register a snapshot repository and I have a bucket in S3 created just for backups. When I do a /_all I see the current repo settings. So, at this point all looks fine. However, when I try to create a snapshot it fails with RepositoryMissingException. This is what I get for a /_all: { statlogs -d: { type: s3, settings: { region: us-east, bucket: my-bucket-name, access_key: my-access-key, secret_key: my-secret-key } } } This is what I am sending when I try to do a snapshot: PUT /_snapshot/statlogs/snapshot_1 -d { indices: [statexceptionlog], ignore_unavailable: true, include_global_state: false } I am using Sense to send the commands. I'm assuming I am getting the error because of something wrong with my S3 settings but I don't know what it would be. I'm making this assumption because the /_all returns data (but I guess that could be wrong). Any ideas on what the issue might be? What exactly causes RepositoryMissingException? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac375bf8-6f78-41a2-9cb0-0409234a004f%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/ac375bf8-6f78-41a2-9cb0-0409234a004f%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to a topic in the Google Groups elasticsearch group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/-av1rGK1bQE/unsubscribe. To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/36BC1F5F-1D3A-4E56-A519-C1952C9C0D80%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/36BC1F5F-1D3A-4E56-A519-C1952C9C0D80%40pilato.fr?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c8375d28-016e-4efb-9bcb-3ece6110f053%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
better places to store es.nodes and es.port in ES Hive integration?
Hi, I am playing with elasticsearch and hive integration. The documentation says to set configuration like es.nodes, es.port in TBLPROPERTIES. It works. But it can cause many reduntant codes. If I have ten data set to index to the same es cluster, I would have to repeat this information ten times in TBLPROPERTIES. Even if I use var substitution I still have to rwrite this subtititiov var for each table definition. What I am looking for is to put these info in say one file and pass the location, in some way, to hive cli so hive elasticsearch will get these settings when trying to find es server to talk to. I am not looking into put these info into files like hive-site.xml. Thanks, Jack -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7040c805-e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Changing Kibana-int based on context
I don't think you can do this dynamically within kibana. The better way would be to run multiple instances of KB and then use a proxy to handle the redirects. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 14 June 2014 00:43, mysterydark diyabi...@gmail.com wrote: I am a newbie to Computer science in general and at present I am working on a project which involves Elasticsearch, logstash, and Kibana and we are using this to build up a centralized Logging system. In kibana config.js , there is a parameter kibana_index whose default value is set to Kibana-int. Is there a way possible to change the value of Kibana-index based on the context? What I could understand from my research is that kibana-int is the index which stores all the dashboards. When I say context, what I mean is if I have multiple projects in an organization, the dropdown on kibana dashboard page should show the dashboards only under a particular project when I give that project's name as the context in my url. So people working in a project get to see only the ones in their project. The only way I could find is to change the kibana-index value based on the project say something like kibana-projA. So it shows all the dashboards under this particular index. But I couldnt find a way as to how to do it. Could you please help me out. Any help would be appreciated. Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f8ff596f-7813-48b6-9fab-65d075aaebcd%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/f8ff596f-7813-48b6-9fab-65d075aaebcd%40googlegroups.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZKf0ZHiJdnySNdjYPqgPb3qJ3Y%3D0W%2BgmCTVucmKveamw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
elasticsearch curator — version 1.1.0 released
http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/ When Elasticsearch version 1.0.0 was released, it came with a new feature: Snapshot Restore. The Snapshot portion of this feature allows you to create backups by taking a “picture” of your indices at a particular point in time. Soon after this announcement, the feature requests began to accumulate. Things like, “Add snapshots to Curator!” or “When will Curator be able to do snapshots?” If this has been your desire, your wish has finally been granted…and much, much more in addition! There looks to be a whole heap of cool stuff added, snapshots, aliases, allocation routing and more! Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624axebWs4N43N_6aOZCPaBGOzLhdyHKWQKTnZMgDPWNpWw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: elasticsearch curator — version 1.1.0 released
It has a prefix setting, but not a suffix. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 14 June 2014 13:35, Ivan Brusic i...@brusic.com wrote: The addition of the snapshot feature is interesting, but I just wish there was a way to specify the index names instead of just specifying the dates. I haven't downloading it yet, but it does have a prefix setting. I need a suffix setting. -- Ivan On Fri, Jun 13, 2014 at 5:38 PM, Mark Walkom ma...@campaignmonitor.com wrote: http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/ When Elasticsearch version 1.0.0 was released, it came with a new feature: Snapshot Restore. The Snapshot portion of this feature allows you to create backups by taking a “picture” of your indices at a particular point in time. Soon after this announcement, the feature requests began to accumulate. Things like, “Add snapshots to Curator!” or “When will Curator be able to do snapshots?” If this has been your desire, your wish has finally been granted…and much, much more in addition! There looks to be a whole heap of cool stuff added, snapshots, aliases, allocation routing and more! Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624axebWs4N43N_6aOZCPaBGOzLhdyHKWQKTnZMgDPWNpWw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAEM624axebWs4N43N_6aOZCPaBGOzLhdyHKWQKTnZMgDPWNpWw%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCL36f_Wz6zKDg4SL8wmt%3DEsQXasQOnn_32SgO%3DosPofQ%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCL36f_Wz6zKDg4SL8wmt%3DEsQXasQOnn_32SgO%3DosPofQ%40mail.gmail.com?utm_medium=emailutm_source=footer . For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YjkRv9M%3DqXfREiKsqM38KCBHN_xj6nLbRENHyFwCDk1A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.