Re: cluster setup
Hey, I am not a huge fan of the OOM killer to be honest. However, something is going against your plans when the OOM killer kicks in You configured 30GB heap, but you are running out of memory (then most of the time the process which takes the most memory is killed, obviously elasticsearch). But why are you running out of memory? Do you have any other service running on that machine, which eats up system memory? Please check (or disable the OOM killer, but you should find out why it kicks in). Also, use the nodes info API to find out if bootstrap.mlockall setting is really configured correctly on your nodes. --Alex On Thu, Jan 16, 2014 at 5:50 PM, Tula tulay.muezzino...@gmail.com wrote: Hi, I have 3 ubuntu VM's on a private network, each has 64GB ram. I started ES Beta2 (need it to use term vector feature) on each node with 30GB heap space and with the following changes in the configuration file: discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: [host1,host2,host3] bootstrap.mlockall: true gateway.type: local gateway.recover_after_nodes: 2 gateway.expected_nodes: 3 discovery.zen.minimum_master_nodes: 2 Sometimes es processes get killed by OOM killer and when I restart a node I ran into situation like host1,host2,host3 thinking all three nodes are connected and form a cluster, and the host3 thinks it is all by itself with status red (names are random) host1 and host2 form one cluster and host2 and host3 forms another cluster. Any idea about what I am doing wrong and any suggestions? I will need to index 20 million documents and have 3 separate indexes with 5 shards 1 replica. Thanks, T -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/62130b83-40aa-468e-9449-0046497d39ff%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_1C-pbsffn4pCkG5dNXKgJrvKEKF-SsBxoCR3cfRuA7A%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: total.store.size_in_bytes measures what?
Hey, the source is just a field in the index, thats the reason for being included. What is not included is the something like the translog, so it is not the entire disk space used by an index is in there iirc. --Alex On Thu, Jan 16, 2014 at 6:52 PM, Ryan Pedela rped...@datalanche.com wrote: I did more digging. Turns out that using version 0.90.9, the _source data is included in the calculation. In other words, the stats are the entire disk space used by an index including source data. And it is broken down by indices, primaries, etc as Alex said. I did not test to see if it takes into account source data compression, but it appears that it does. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5210cf67-4cf7-4695-89c3-0b3fb48a6290%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-xucydRcjJcZmb8zRcCQX2OzJHB%2BcDck8w%3Dq8FhHpHQA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Query score based on aggregated values
Hey, you could execute a query, sorted by price and in the same request execute a statistical facet for the price field and then check in your client for each hit being returned, if it is above the average value returned by the statistical facet. You could also do this in two roundtrips, getting the statistical average from the facet first and then executing a second query filtering only for products with a price higher than the average. --Alex On Fri, Jan 17, 2014 at 12:43 AM, Kevin Pearson kevin.pearso...@gmail.comwrote: I am wondering if there is a way to use aggregated values inside a query. Example: Say our data contains items and their price: { id : string name : string price : float } I want to do a query that returns the top items that have a price far from the average price of items with the same name. Example Data: *ID | Name |Price* 1| Chair | 5.99 2| Chair | 5.99 3| Chair | 59.99 4| Desk | 61.00 5| Desk | 60.00 6| Desk | 59.99 The top response would be ID 3, since 59.99 is way higher than the average price for a chair. I believe I need to write a custom score script, but I am not sure how I can get a reference to the average of items with the same name. Thank you, Kevin -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8069763d-9ce2-4dfc-afc5-6293c2171828%40googlegroups.com . For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8aj-g0sbxSTtbueAh8rG7FgaGOzovVEuK7DwLbzwXjbg%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Disable increment of version counter on some update operations possible?
Thanks for your effort, Brian. I'll think about this (I'm working with node.js not with Java anyway), but I already opened an issue for this (https://github.com/elasticsearch/elasticsearch/issues/4791). -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b75b820-53c4-440c-8144-5427381f21e9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Questions about multi_field, configurations, routing control, filtered alias
Hi, all Recently, I am studying the ElasticSearch. I have several questions about it. Hope someone can answer me. (1) About the multi_field, can it store two type of fields ? such as.. tweet : { properties : { name : { type : multi_field, fields : { name : { type : string, index : not_analyzed }, value : { type : int} (2) if it can, what's the query format when post a new document? Could I explicit specify the value of these two fields? Or there are some type cast operations inside it? (3) Does there any default configuration file exist that configure the default schema mappings of the index and type? Does it only support REST API to create index/configure the mappings? (4)After I configured the number of shards/replicas and post many documents into it, can I re-configure it again? And how ? if so, what happened when the shard number increase? Do it cost a lots of performance? (5)About the routing, can I control the documents that must be sent to different shards? I know I can use the same routing value to index/search in the same shard. But could I control some documents which must be located in different shards of the other documents? (6) Assume I have only one node and one index, what's the difference between the size of shard is only one and ten of the same index? Does it cost extra memory if the shards size is ten? What's the suggested rule to decide this size? (7) What's the difference between setting the search_type to scroll and using the parameters(from/size)? (8) About the alias filtering, what's the cost about creating a alias filter? Are there any cache algorithms to accelerate these operations using the alias filter? Or it just append the extra filter condition of the filtered alias in the query? Sorry for the newbie questions, could you give me some opinion about these questions? Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a46431bd-cef8-4714-9f08-0445f376b2a1%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
jdbc-river error: no sqljdbc_auth in java.ibrary.path for sqlserver dll
Hi to all I have the following problem with sqlserver authentication. I am using ElasticSearch with SQLServer using the mixed authentication; I define my indexing on the river plugin as something like: curl -XPUT http://localhost:9200/_river/itra_jdbc_river/_meta; -d' { type: jdbc, jdbc: { driver: com.microsoft.sqlserver.jdbc.SQLServerDriver, url: jdbc:sqlserver://MY_SERVER;integratedSecurity=true;databaseName=MY_DB, user: INTRANET\\MY_USER, password: MY_PWD, sql: SELECT * FROM A_TABLE, versioning: false }, index: { index: test, type: values } }' and I started ES with -Djava.library.path=mssql\auth\x64 option, where mssql folder is under the jdbc-river plugin folder. However, I still obtain the no sqljdbc_auth in java.ibrary.path error, so the dll seems to be not correctly referenced. I also notice that the jar for sqlserver must be instead on the jdbc-river folder itself. Any suggestions? thanks in advance, Alfredo -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/000230a4-2f06-4fc8-b1f1-56ec69ada894%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Synonym Filter
Hi, My Synonym file contains the entry as below MIT,Massachusetts Institute of Technology My setting is as below: settings:{ analysis:{ analyzer:{ synonym:{ tokenizer:my_pipe_analyzer, filter:[ lowercase, syns_filter ] }, my_pipe_analyzer:{ tokenizer:my_pipe_analyzer }, autocomplete_search:{ type:custom, tokenizer:my_pipe_analyzer, filter:[ lowercase, syns_filter, stop ] } }, tokenizer:{ my_pipe_analyzer:{ type:pattern, pattern:\\| } }, filter:{ syns_filter:{ synonyms_path:synonyms/synonym_collegename.txt, type:synonym, ignore_case:true } } } } I have created a pipe separated tokanizer so that the synonyms are not split on spaces still it is getting split on spaces when i verify it with the analyze API , below is my output from analyzer api. { tokens:[ { token:mit, start_offset:0, end_offset:3, type:SYNONYM, position:1 }, { token:massachusetts, start_offset:0, end_offset:3, type:SYNONYM, position:1 }, { token:institute, start_offset:0, end_offset:3, type:SYNONYM, position:2 }, { token:technology, start_offset:0, end_offset:3, type:SYNONYM, position:4 } ] } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7516d1a7-72d0-4b3f-b426-deb80b8d6450%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Disable increment of version counter on some update operations possible?
Hi Jörg, as I understand, even with external versioning its not possible to update a doc *without changing/not incrementing* the version number at all. 1. User A loads DOC123 with version 20 and locally starts editing critical fields 2. User B simply loads DOC123 to view/read only. A view counter will be incremented, so also the version number will be set to 21 (or something higher with external versioning, but not to 20 again) 3. User A tries to send the updates from (1) with version number 20 to ensure he has the current version and update will fail, cause version number has changed Thanks Joa -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b691324b-7955-4379-b014-ee9d27a29e38%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: jdbc-river error: no sqljdbc_auth in java.ibrary.path for sqlserver dll
Hi a little update: adding the reference to the absolute path in the Path variable worked... thus seems like ES is currently ignoring the -Djava.library.path parameter passed from command line. Is that possible? Il giorno lunedì 20 gennaio 2014 14:02:36 UTC+1, Alfredo Serafini ha scritto: Please, use absolute paths in java.library.path With other java applications a relative path works without too much problems, so this was my first test. I've tested it also with absolute path, without luck. I've also tried with '\\' instead of '\', or with '/', just to avoid problems with windows paths. Any other suggestion? Of course, JDBC jars must be in the jdbc-river folder, otherwise they can not be found by ES plugin manager. done, in fact removing the jar, the error changes to no suitable driver... etc etc thanks, Alfredo -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bf6443af-25c9-4839-85e4-3f00f3cb4dc8%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: jdbc-river error: no sqljdbc_auth in java.ibrary.path for sqlserver dll
You have to add parameters for the Java JVM in the JAVA_OPTS variable, e.g. in $ES_HOME/bin/elasticsearch.in.sh For Windows I don't know where to set JAVA_OPTS but maybe there is something like $ES_HOME/bin/elasticsearch.bat Jörg On Mon, Jan 20, 2014 at 2:42 PM, Alfredo Serafini ser...@gmail.com wrote: Hi a little update: adding the reference to the absolute path in the Path variable worked... thus seems like ES is currently ignoring the -Djava.library.path parameter passed from command line. Is that possible? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEuYUR1OUZUmX0A0-vQ0aRGTF%2BD3Tqo5eX7aMD-eHnUtA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Is there any way to remove duplicated search result in ES?
Thank you for your rapid reply . it is true that i can custom my own search action, but i can not override the default search action .so, it is not what i want. at indexing time , there are serval listeners to install plugins, but at searching time there is hardly any listener to extend the search operation except the search action . why not provide a opportunity to install my own plugin to extend the search phase , because it seems to be simple from the source code . i should give up the solution using the lucene duplicate filter according to your answer . it is very useful of your proposal to solve my problem .I will try it .thank you very much ! 在 2014年1月20日星期一UTC+8下午6时45分40秒,Jörg Prante写道: It is not true there is no chance to install my plugin after ES have collected all search result. You can implement a plugin with an alternative search action. The issue you have cited is related to overriding default actions and there is good reason in not allowing that. The Lucene DuplicateFIlter works on segment level and is not suitable for index level and not for distributed search. The basic idea is, if you want the newest one of documents, you can sort docs by timestamp, and pick the first one, ignoring the followers. You can use aggregations plus filtered queries to issue a series of queries against an ES index and deduplicate it at client side, using your custom rules of ordering (e.g. one bucket per author, and pick at most one doc per author from sorted timestamped result set of a filtered query). Note, this procedure is very expensive, and does not scale. The best method is indexing deduplicated data, which is the most preferred solution, because it is cheap: fetch the list of docs per author from the original source and index only the one to want to have in search results. Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f3cc26de-2c06-4573-b8e5-61ede607b19e%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: jdbc-river error: no sqljdbc_auth in java.ibrary.path for sqlserver dll
Would love to help, also to Windows elasticsearch.bat specific problems, but I'm afraid I can't. Fact is, you have to find out where the Java Runtime of Elasticsearch is executed - it is in the script elasticsearch or in case of Windows elasticsearch.bat - and in that call, you must add JVM arguments like java.library.path, and because the library loading is executed from a daemon process, you have to choose absolute paths, so it can not fail. ES provides JAVA_OPTS variable for convenience to collect several JVM arguments defined in scripts. Example -Djava.library.path=C:\Users\Dummy\Downloads\sqljdbc_4.0\enu\auth\x64 Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGouK4AnOmyfX5Gv4AARUuDwzpfk6x6g2Jb0rF7gPYUGA%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
Completion suggester for multiple fields
Hello, one small newbie question. Is it possible to use completion suggester for more than one field? Assuming all of them are of type completion. Something like this { song-suggest : { text : n, completion : { * fields : [suggest, name, author, smthElse]* } } } Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b888cf80-5c29-4150-97dd-8d8974ea6ee0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
1.0.0.0.RC1 breaking changes: multi_field documentation
Looking at the following to prepare for RC1: http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_multi_fields.html I find the following. But shouldn't the semicolon after the first type string be a comma? Brian you can now write: title: { type: string; fields: { raw: { type: string, index: not_analyzed } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/008da72d-f2a3-4f76-a9ec-0dd335be8c23%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
1.0.0RC1 breaking changes : GC stats
Well, not sure if it qualifies as a breaking change, unless you cowboy code javascript like I do ;-) The node stats api (/_nodes/{this.nodeId}/stats?all=1) is returning a different format for JVM GC now, splitting old and young generations (which is really helpful) I didn't notice this change in Beta1, so I thought I'd point it out and maybe someone can/might update the doc. gc: { collectors: { young: { collection_count: 3, collection_time_in_millis: 136 }, old: { collection_count: 0, collection_time_in_millis: 0 } } I also didn't notice previously the jvm returning pool information (another helpful addition!) pools: { young: { used_in_bytes: 30180776, max_in_bytes: 279183360, peak_used_in_bytes: 71630848, peak_max_in_bytes: 279183360 }, survivor: { used_in_bytes: 8912888, max_in_bytes: 34865152, peak_used_in_bytes: 8912896, peak_max_in_bytes: 34865152 }, old: { used_in_bytes: 21765456, max_in_bytes: 724828160, peak_used_in_bytes: 21765456, peak_max_in_bytes: 724828160 } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/da092537-d6b7-4870-81fa-ed599fe610ea%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Disable increment of version counter on some update operations possible?
Thanks Clinton -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e11afdd7-9ee8-4038-b382-e12f6d6dbdfb%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
how can I access script field in map script of facet script plugin??
I need to use the script field value in my map script. I am not sure how can i access that. My query looks like this, I need to use value of state_info into my map_REPORT script. Is it possible to access that?? { query: { --- }, script_fields: { state_info : { script : lookup, lang : native, parems :{ field : state}} }, facets: { reportFacet: { script: { init_script: init_REPORT, map_script: map_REPORT, reduce_script: reduce_REPORT, params: { -- } } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/82332df4-a4a0-4e3b-8439-cdeb3a6bf3de%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Questions about multi_field, configurations, routing control, filtered alias
Ivan, 1) The multi_field type allows you to define different ways that a *single field value* will be indexed. Your example below will work and will index a single value as string/not_analyzed, and then as an int (use integer for int) 2) The document coming in will contain a field named name with a single value. When it goes into the index, it will be indexed 2 different ways. 3) A mapping is not required to index data. There is an implied default mapping that will parse your JSON content and dynamically update the schema if you don't specify one up-front. 4) You cannot change the shard count after the index is created. You can change the replica count anytime. The PUT mapping API allows you to change the replica count. 5) You can specify a single routing value for all documents that you want to go to a specific shard/location. 6) The number of shards will allow you to scale your content later. So if your data volume increases, you can add more nodes later and distribute the shards around. If you only have a single shard and you run out of space, then you cannot scale out unless you increase storage, or increase the shard count. 7) Scroll is used to do a snapshot type of search - i.e., results you get back will not be affected by updates to the index after you start scrolling. From/size are useful if you want to do paging of search results (or infinite scrolling but paged at a time). 8) Filters execute fast and yes can be cached. On Monday, January 20, 2014 6:21:43 AM UTC-5, Ivan Ji wrote: Hi, all Recently, I am studying the ElasticSearch. I have several questions about it. Hope someone can answer me. (1) About the multi_field, can it store two type of fields ? such as.. tweet : { properties : { name : { type : multi_field, fields : { name : { type : string, index : not_analyzed }, value : { type : int} (2) if it can, what's the query format when post a new document? Could I explicit specify the value of these two fields? Or there are some type cast operations inside it? (3) Does there any default configuration file exist that configure the default schema mappings of the index and type? Does it only support REST API to create index/configure the mappings? (4)After I configured the number of shards/replicas and post many documents into it, can I re-configure it again? And how ? if so, what happened when the shard number increase? Do it cost a lots of performance? (5)About the routing, can I control the documents that must be sent to different shards? I know I can use the same routing value to index/search in the same shard. But could I control some documents which must be located in different shards of the other documents? (6) Assume I have only one node and one index, what's the difference between the size of shard is only one and ten of the same index? Does it cost extra memory if the shards size is ten? What's the suggested rule to decide this size? (7) What's the difference between setting the search_type to scroll and using the parameters(from/size)? (8) About the alias filtering, what's the cost about creating a alias filter? Are there any cache algorithms to accelerate these operations using the alias filter? Or it just append the extra filter condition of the filtered alias in the query? Sorry for the newbie questions, could you give me some opinion about these questions? Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/faf05ddc-566a-4cc8-9488-7a506c154409%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Unexpected behavior from nested - filter - nested aggregation
First, let me say I'm very excited about the new aggregations. Great work! I've got a type with two layers of nesting: script: calls: [ name: string params: [ name: string value: string ] ] I want to run an aggregation over the parameter values for calls to a specific function. Here's the skeleton of what I tried: 'aggs': {'b': { 'nested': {'path': 'calls'}, 'aggs': {'c': { 'filter': {'term': {'calls.name': 'particular_func'}}, 'aggs': {'d': { 'nested': {'path': 'calls.params'}, 'aggs': ... The structure is three aggregations: a nested wrapping a filter wrapping a nested. Checking the doc counts on these, I see that the outer two work as expected: the doc count for the outer nested is the number of nested calls documents, the doc count for the filter is the number of those nested calls docs that pass the filter. But it appears that the inner nested resets the buckets: it returns the number of inner nested params documents across all calls docs, regardless of the filter. Is there a way to do what I want? -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83666871-22f3-4989-9b76-be822d3cf19c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: PagedBytesIndexFieldData cannot be cast to IndexNumericFieldData
Sure thing here are the detail. We have type with following mapping with dynamic: strict so other datatype data can let go in. Note, it has a field Id with long datatype. When we try to get statistical facet on Id it gives error* PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]* And it happens randomly, once I wipe out index and create it again, it works for some time and then all of a sudden it start giving error. { portfoliosearch: { dynamic: strict, properties: { Id: { type: long, index: not_analyzed, index_Name: Id } Name: { type: string, index: not_analyzed, index_Name: Name } }, _routing: { required: true }, _parent: { type: importsetsearch } } } statistical query { query: { match_all: {} }, facets: { Id: { statistical: { field: Id } } } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fe471294-43c7-48b8-ab6a-0248f8c1d6a0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[Ann] JDBC river 1.0.0.RC1.2
Hi, just a quick note about a new release of JDBC river. https://github.com/jprante/elasticsearch-river-jdbc/ Changes - a series of SQL statements can now be executed at each river cycle - execution of SQL statements with thread pool size (like connection pooling) - river state saved and loaded at each cycle - schedule parameter with crontab-like specification - no more oneshot strategy - poll parameter removed in favor of schedule or interval - acksql, acksqlparams removed in favor of SQL statement series - driver parameter removed - new parameter bulk_flush_timeout - experimental CallableStatement support improved - river cycle can be executed at once by new REST river induce command - new REST river state inspection command - many bug fixes and cleanups I will rework and extend the documentation pages in the github wiki the next days. Best, Jörg -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFGiseH6%3Dig5xcVDcV9o%2BOVeR%3D1c4HYFXT4nsSv85QbeQ%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
[ANN] ElasticHQ v1.0.0 (ElasticSearch v1.0.0.RC1 Support)
Hello All, Pleased to announce the release of ElasticHQ v1.0.0. This release added: 1. *Support for ElasticSearch v1.0.0RC1* and unbroke the breaking changes. ;-) 2. Support for monitoring multiple file systems 3. Support for G1 GC 4. Allow user to select which nodes are displayed on the Diagnostics Screen *Every HQ release is always backwards compatible*, so there's no magic needed on your part. As always, you can get it here: http://www.elastichq.org/gettingstarted.html Regards, Roy Russo http://www.elastichq.org/blog/ -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8582fbfb-d5a0-4cc4-8310-08fbf7aa0456%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: Questions about multi_field, configurations, routing control, filtered alias
Hi Bing, First, really thanks for your reply. According to the replies, I have few questions about it below. Binh Ly於 2014年1月21日星期二UTC+8上午5時10分10秒寫道: Ivan, 1) The multi_field type allows you to define different ways that a *single field value* will be indexed. Your example below will work and will index a single value as string/not_analyzed, and then as an int (use integer for int) 2) The document coming in will contain a field named name with a single value. When it goes into the index, it will be indexed 2 different ways. 3) A mapping is not required to index data. There is an implied default mapping that will parse your JSON content and dynamically update the schema if you don't specify one up-front. 4) You cannot change the shard count after the index is created. You can change the replica count anytime. The PUT mapping API allows you to change the replica count. 5) You can specify a single routing value for all documents that you want to go to a specific shard/location. Yes, but can I control the two sets of document must be store in *different*shards? Because if I use different routing values, does it means it can be stored in different shard? I guest not, right? Although the hash value of these two values are different, I am not sure what the range that the routing value belong to a single shard. And I want ti store these documents in different shard. 6) The number of shards will allow you to scale your content later. So if your data volume increases, you can add more nodes later and distribute the shards around. If you only have a single shard and you run out of space, then you cannot scale out unless you increase storage, or increase the shard count. 7) Scroll is used to do a snapshot type of search - i.e., results you get back will not be affected by updates to the index after you start scrolling. From/size are useful if you want to do paging of search results (or infinite scrolling but paged at a time). 8) Filters execute fast and yes can be cached. About filters, I want to know the underlying algorithm. If I create an alias which represent about half the index, does it increase the index size? I mean if I create aliases, does it operate and store some really data about it into the storage? or it just remember the condition and process like some predefined adapter which cannot store something stored data inside the storage? Another question: What's the suggestion if I need to modify the mapping of some index, such as from store=no to yes, or remove some field ? Because after I read these days, it seems hard to change a existed mapping and there are much limitation of it. Again, thanks for your replies. Cheers, Ivan -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c581704f-8c66-4151-8816-31065867218b%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Return specific field and highlights via Java API
I am having two issues using the java api 1. I am not able to return specific field in my search query - It shows I have the right number of results, but displays Null 2. Not return highlights Note: Assume Indexing is fine, because I am able to get correct results if comment out the line .AddField(hid) I am using default everything, I understand for highlights _source for field has to be enabled, but I thought if not, it grabs the original source. json: {uid:'123, name:hello}, {uid:'1234, name:hello1} node = NodeBuilder.nodeBuilder() // .local(true)// .data(true) // .node(); client = node.client(); //..createIndex private void search(String index, String type,String field, String value) { SearchResponse response = client.prepareSearch(index) .setTypes(type) .addHighlightedField(uid) .addField(uid) SearchHit[] results = response.getHits().getHits(); System.out.println(Current results: + results.length); for (SearchHit hit : results) { System.out.println(--); MapString,Object result = hit.getSource(); System.out.println(result); } } -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4984505f-9946-4855-8bf0-5dd11b0a1b21%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Synonym configuration
Hi , I read about in lot of places That There are two approaches when working with synonyms : - expanding them at indexing time, - expanding them at query time. Expanding synonyms at query time is not recommended since it raises issues with : - scoring, since synonyms have different document frequencies, - multi-token synonyms, since the query parser splits on white spaces. so to configure expanding synonym ant index time in elastic search what is the configuration. right now my configuration is as below , i am using synonym filter both in index analyzer and query analyzer so that means i am expanding index time and query time. name:{ type:string, index_analyzer : autocomplete_index, search_analyzer : autocomplete_search }, { settings:{ analysis:{ analyzer:{ synonym:{ tokenizer:whitespace, filter:[ lowercase, syns_filter ] }, autocomplete_search:{ type:custom, tokenizer:whitespace, filter:[ lowercase, syns_filter, stop ] }, autocomplete_index:{ type:custom, tokenizer:whitespace, filter:[ lowercase, syns_filter, stop, my_edgeNgram ] } filter:{ syns_filter:{ synonyms_path:synonyms/synonym_collegename.txt, type:synonym, ignore_case:true, expand:false } } -paul -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1c21cd11-eb92-47b5-b695-61b33bd256fa%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Re: facets.total and hits.total dont match
I have indexed some records by making test_field to be 'analyzed'. If the analyzed field causing this issue, is there any other facet type/work around which can solve the problem? On Tuesday, January 21, 2014 12:15:45 PM UTC+5:30, Chetana wrote: I have an application where I need both search results and facet information. Everytime a query is framed based on some filter condition and query words and it is passed to both facet and search request as given below. The field (test_field) on which the facet to be applied is present in all documents. BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); SearchRequestBuilder srb = client.prepareSearch(Test); srb.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery( boolQueryBuilder); and TermsFacetBuilder facBuilder = FacetBuilders.termsFacet(test_field); facBuilder.facetFilter(FilterBuilders .queryFilter(boolQueryBuilder)); facBuilder.fields(test_field); facBuilder.global(true); // I tried commenting this too, but I get the same result srb.addFacet(facBuilder); hits : { total : 117, max_score : null, hits : [ { }] facets : { assettype : { _type : terms, missing : 5, total : 119, other : 0, terms : [ { }] But the hit count is different from the facet count. Can anyone please explain me why this discrepancy? Thanks -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8ed55c0-6599-4612-995d-28d3340e69f7%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Open ports between nodes an main configuration
Hello All, I have configured elasticsearch with ports 9200 and 9300. Everything works fine, as expected, but one configuration I haven`t founded: elasticsearch 0.90.10 3 indexes, each with 5 nodes without replicas Then I do 'lsof -i' , I get a huge list with open ports (ESTABLISHED) from 39619 - to 39634, and on another server running for experiments it makes ESTABLISHED connections from these ports : 43010 - 43025. Is it available to say in configuration, that it would use ports from 9301 to 9400? O maybe how I could know which ports it would take always ? Thanks. -- You received this message because you are subscribed to the Google Groups elasticsearch group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bedf0990-65a8-4469-aed1-cd9e32af5828%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.