Changing NON Dynamic setting of allready created index

2015-05-29 Thread Amish Asthana
Hi 
We have an index with setting like :

curl -XGET 'http://localhost:9200//_settings' -d '{
...
"analysis" : {
  "filter" : {
"truncate_filter" : {
  "type" : "truncate",
  "length" : "7000"
},

}


Now we want to change the length of the truncate filter setting without 
recreating the index.

When we tried to change the setting it gives error:
error": "ElasticsearchIllegalArgumentException[Can't update non dynamic 
settings.

This one can understand.

Then we went ahead and did the following:
a) Close the index
b) Again changed the setting to something smaller say 5000.  It went 
through fine.
c) Open the index and checked the setting. It shows the new setting as 5000 
and as far as we can see it works fine.

Question : What exactly happened here. I am assuming it has not changed 
anything for the existing data, but is using this setting only for new 
data. Is this a bug or intended behaviour or just a workaround which we 
should not use.

regards and thanks
amish

-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7a8761ff-de25-40cd-8cac-5e32a35ee158%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: pagination with range queries giving duplicate results

2015-01-29 Thread Amish Asthana
Hi David
We are aware of scroll API, and are not using it as it will not scale.
That is the very reason I was stressing the fact that there is no 
update/delete/create; as with multiple queries all bets are off if any of 
this thing happen.
However with steady state)no change in data) I would expect them to work.

To answer your question : It happens in different pagination group.
regards and thanks
amish

On Wednesday, January 28, 2015 at 11:06:24 PM UTC-8, Amish Asthana wrote:
>
> Hi Folks 
> We are facing an issue intermittently of range queries with pagination 
> missing some records or giving duplicate ids.
>
> Let me describe our system.
> Lets say we have certain number of records and during which queries are 
> being made in ES, we can assume that no record is getting created/updated 
> or deleted. 
>
> The query we have is something like :
>
> {"query":{"bool":{"must":[{"range":{"lastname":{"from":"Doe","to":null,"include_lower":false,"include_upper":true}}},{"range":{"firstname":{"from":"joe","to":null,"include_lower":false,"include_upper":true}}}]}}}
>
> Now we have also have from":X,"size":Y, and we will issue multiple queries 
> with from increasing every time as X = X + Y.
>
> The idea is that every time we will get unique records.
> Unfortunately from time to time it does not happen. ( As I said its a 
> closed system, so lets assume nobody is updating/deleting/creating data.)
> We see some records which are not there and some records which are 
> duplicate.
>
> Anybody has seen similar issue, or can shed some light as to how we should 
> debug this?
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e005e536-5406-4321-b736-93b200765393%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


pagination with range queries giving duplicate results

2015-01-28 Thread Amish Asthana
Hi Folks 
We are facing an issue intermittently of range queries with pagination 
missing some records or giving duplicate ids.

Let me describe our system.
Lets say we have certain number of records and during which queries are 
being made in ES, we can assume that no record is getting created/updated 
or deleted. 

The query we have is something like :
{"query":{"bool":{"must":[{"range":{"lastname":{"from":"Doe","to":null,"include_lower":false,"include_upper":true}}},{"range":{"firstname":{"from":"joe","to":null,"include_lower":false,"include_upper":true}}}]}}}

Now we have also have from":X,"size":Y, and we will issue multiple queries 
with from increasing every time as X = X + Y.

The idea is that every time we will get unique records.
Unfortunately from time to time it does not happen. ( As I said its a 
closed system, so lets assume nobody is updating/deleting/creating data.)
We see some records which are not there and some records which are 
duplicate.

Anybody has seen similar issue, or can shed some light as to how we should 
debug this?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c5bb300a-ee7c-48d5-a408-f9a1a1b267b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: encoding is longer than the max length 32766

2015-01-02 Thread Amish Asthana
How does this MAX_LENGTH restriction impact on a custom_all field where we 
may be copying data from different fields using some analyzer.
Is the MAX_LENGTH restriction also applicable on such custom_all field 
which in turn implies that in such a case cumulative length is what matters.
amish

On Thursday, October 30, 2014 3:43:26 AM UTC-7, Rotem wrote:
>
> +1 on this question. 
>
> If the error is generated because of a not_analyzed field, how is it 
> possible to instruct ES to drop these values instead of failing the request?
>
>
> On Tuesday, July 1, 2014 10:22:54 PM UTC+3, Andrew Mehler wrote:
>>
>> For not analyzed fields, Is there a way of capturing the old behavior? 
>>  From what I can tell, you need to specify a tokenizer to have a token 
>> filter.
>>
>> On Tuesday, June 3, 2014 12:18:37 PM UTC-4, Karel Minařík wrote:
>>>
>>> This is actually a change in Lucene -- previously, the long term was 
>>> silently dropped, now it raises an exception, see Lucene ticket 
>>> https://issues.apache.org/jira/browse/LUCENE-5710
>>>
>>> You might want to add a `length` filter to your analyzer (
>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-length-tokenfilter.html#analysis-length-tokenfilter
>>> ).
>>>
>>> All in all, it hints at some strange data, because such "immense" term 
>>> shouldn't probably be in the index in the first place.
>>>
>>> Karel
>>>
>>> On Thursday, May 29, 2014 10:47:37 PM UTC+2, Jeff Dupont wrote:

 We’re running into a peculiar issue when updating indexes with content 
 for the document.


 "document contains at least one immense term in (whose utf8 encoding is 
 longer than the max length 32766), all of which were skipped. please 
 correct the analyzer to not produce such terms”


 I’m hoping that there’s a simple fix or setting that can resolve this.

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f19fafb9-a9e0-42a9-b290-a9b37d1da51d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Decoupling Data and indexing

2014-11-12 Thread Amish Asthana
Thanks Jorg.

On Wednesday, November 12, 2014 12:23:06 AM UTC-8, Jörg Prante wrote:
>
> There is no current method to redirect indexing to a preparer index for 
> delayed indexing, while searching is still enabled.
>
> By using rivers, you can close the _river index, some rivers (not all) may 
> take this as an indicator to stop indexing unless the _river index is 
> reopened. I consider this as a workaround and not as a feature.
>
> From my understanding the most preferred method to implement delayed 
> indexing currently is to set up a durable message queue (like RabbitMQ and 
> logstash) for external document persistency. By stopping/starting and 
> reconfiguring the message queue, the data can be indexed wherever you like.
>
> If you like to see delayed indexing as a core feature in ES and not as a 
> plugin, then you should open an issue with the suggestion. To be honest I 
> assume this will be rejected in favor of a queue in front of ES, like 
> described in this blog post 
>
> http://dopey.io/logstash-rabbitmq-tuning.html
>
> Jörg
>
>
> On Tue, Nov 11, 2014 at 11:40 PM, Amish Asthana  > wrote:
>
>> Thanks Jorg, make sense.
>> Few  minor questions :
>> a) With the current ES architecture is this the best/recommended way?
>> b) Is there any project in roadmap to provide more support for it.
>>
>> regards and thanks
>> amish
>>
>> On Tuesday, November 11, 2014 12:08:24 PM UTC-8, Jörg Prante wrote:
>>>
>>> FAST stored the source data in distributed machines, only the control 
>>> API was not distributed (similar to ES HTTP curl requests, which also 
>>> connect to one host only).
>>>
>>> Of course you could index raw JSON to a preparer index with a single 
>>> field, _all disabled, and field set to "not indexed" so there is no Lucene 
>>> activity on it. This preparer index could also hold mappings in special 
>>> documents for the indexing runs.
>>>
>>> The data duplication factor depends on the complexity of the mapping(s), 
>>> and the characteristics of the data (dictionary size, analyzer / tokenizer 
>>> output, norms etc.) 
>>>
>>> A plugin would do no magic at all, it could bundle the calls that 
>>> otherwise a client would have to execute from remote, and adds some 
>>> convenience commands for managing the prepare stage (e.g. suspend/resume) 
>>> and showing the current state of indexing.
>>>
>>> If redundant data is a no-go, then the whole approach is 
>>> counterintuitive.
>>>
>>> Jörg
>>>
>>>
>>> On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana  
>>> wrote:
>>>
>>>> With existing Elastic Search I can think of an architecture like this.
>>>>
>>>> Index : indexForDataDump : No mapping(Is it possible?) or minimum 
>>>> mapping. Use only to dump data from external system. There is some primary 
>>>> key.
>>>>
>>>> There are different search indexes with different mapping : 
>>>> search-index1, search-index2 etc.
>>>> These indexes get populated from the indexForDataDump using technique 
>>>> mentioned here 
>>>> <http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/>
>>>> .
>>>> So this way I can drop the search index as desired and create new one 
>>>> with new mapping.
>>>> Any pros/cons or issue with this approach? There will be data 
>>>> duplication but  I am hoping its minimum. ( Any way to quantify it?)
>>>>
>>>> regards and thanks
>>>> amish
>>>>
>>>>
>>>> On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote:
>>>>>
>>>>> I am not aware of FAST but the idea looks promising.
>>>>> However it might not be that easy to just have plugin for ES, as the 
>>>>> data itself is distributed on different machines.
>>>>> So it will not be possible to have just one server with the data, as 
>>>>> it will become single point of failure.
>>>>> regards and thanks
>>>>> amish
>>>>>
>>>>> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
>>>>>>
>>>>>> I know from the FAST Search engine ten years ago there was a 
>>>>>> two-phase commit for distributed search and indexing. One server could 
>>>>>> listen on the API and keep the (compressed) input stored, and all the 
>>>>&

Re: Decoupling Data and indexing

2014-11-11 Thread Amish Asthana
Thanks Jorg, make sense.
Few  minor questions :
a) With the current ES architecture is this the best/recommended way?
b) Is there any project in roadmap to provide more support for it.

regards and thanks
amish

On Tuesday, November 11, 2014 12:08:24 PM UTC-8, Jörg Prante wrote:
>
> FAST stored the source data in distributed machines, only the control API 
> was not distributed (similar to ES HTTP curl requests, which also connect 
> to one host only).
>
> Of course you could index raw JSON to a preparer index with a single 
> field, _all disabled, and field set to "not indexed" so there is no Lucene 
> activity on it. This preparer index could also hold mappings in special 
> documents for the indexing runs.
>
> The data duplication factor depends on the complexity of the mapping(s), 
> and the characteristics of the data (dictionary size, analyzer / tokenizer 
> output, norms etc.) 
>
> A plugin would do no magic at all, it could bundle the calls that 
> otherwise a client would have to execute from remote, and adds some 
> convenience commands for managing the prepare stage (e.g. suspend/resume) 
> and showing the current state of indexing.
>
> If redundant data is a no-go, then the whole approach is counterintuitive.
>
> Jörg
>
>
> On Tue, Nov 11, 2014 at 7:46 PM, Amish Asthana  > wrote:
>
>> With existing Elastic Search I can think of an architecture like this.
>>
>> Index : indexForDataDump : No mapping(Is it possible?) or minimum 
>> mapping. Use only to dump data from external system. There is some primary 
>> key.
>>
>> There are different search indexes with different mapping : 
>> search-index1, search-index2 etc.
>> These indexes get populated from the indexForDataDump using technique 
>> mentioned here 
>> <http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/>.
>> So this way I can drop the search index as desired and create new one 
>> with new mapping.
>> Any pros/cons or issue with this approach? There will be data duplication 
>> but  I am hoping its minimum. ( Any way to quantify it?)
>>
>> regards and thanks
>> amish
>>
>>
>> On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote:
>>>
>>> I am not aware of FAST but the idea looks promising.
>>> However it might not be that easy to just have plugin for ES, as the 
>>> data itself is distributed on different machines.
>>> So it will not be possible to have just one server with the data, as it 
>>> will become single point of failure.
>>> regards and thanks
>>> amish
>>>
>>> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
>>>>
>>>> I know from the FAST Search engine ten years ago there was a two-phase 
>>>> commit for distributed search and indexing. One server could listen on the 
>>>> API and keep the (compressed) input stored, and all the other indexing 
>>>> servers were supplied by this input in another phase to create binary 
>>>> indexes, either automatically, or by manual operation, called 
>>>> "suspend/resume indexing API". 
>>>>
>>>> The advantage was that data could be received permanently via API while 
>>>> FAST indexing could be stopped temporarily in order to balance between 
>>>> indexing and search performance on limited hardware.
>>>>
>>>> Do you think of something like that also for Elasticsearch? This 
>>>> architecture is possible to implement by a plugin.
>>>>
>>>> Jörg
>>>>
>>>> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana  
>>>> wrote:
>>>>
>>>>> Hi
>>>>> Is there a way we can decouple data and associated mapping/indexing in 
>>>>> Elasticsearch itself.
>>>>> Basically store the raw data as source( json or some other format)  
>>>>> and various mapping/index can be used on top of that.
>>>>> I understand that one can use an outside database or file system, but 
>>>>> can it be natively achieved in ES itself.
>>>>>
>>>>> Basically we are trying to see how our ES instance will work when we 
>>>>> have to change mapping of existing and continuously incoming data without 
>>>>> any downtime for the end user.
>>>>> We have an added wrinkle that our indexing has to be edit aware for 
>>>>> versioning purpose; unlike ES where each edit is a new record.
>>>>> regards and thanks
>>>>> amish
>>

Re: Case sensitive/insensitive search combination in phrase/proximity query

2014-11-11 Thread Amish Asthana
Maybe the question to ask will be how do you determine which part of phrase 
will be searched as case sensitive and which will be not.
If that logic is consistent it can be applied at indexing itself, and same 
analyzer used for search.
regards and thanks
amish

On Monday, November 10, 2014 9:22:19 AM UTC-8, Zdeněk Šebl wrote:
>
> Hi,
> is there any way how to search part of phrase as case-sensitive and part 
> as case-insensitive?
>
> The only solution I found for case sensitive/insensitive querying is to 
> have multiple analyzers applied to one field (one analyzer with lowercase 
> token filter and one without)
>
> With this solution I can search in following way
>
> Field.lowercase: "My Phrase"
>
> or
>
> Field.sensitive: "My Phrase"
>
> *But what to do if I whould like to search "My" as case sensitive and 
> "Phrase" as case insensitive?*
>
> I found *span_near* query but error message sais that "*Clauses must have 
> same field*"
>
> Thanks,
> Zdenek
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7bc12f9-2056-4b0b-b0dc-b2c8403e3741%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Decoupling Data and indexing

2014-11-11 Thread Amish Asthana
With existing Elastic Search I can think of an architecture like this.

Index : indexForDataDump : No mapping(Is it possible?) or minimum mapping. 
Use only to dump data from external system. There is some primary key.

There are different search indexes with different mapping : search-index1, 
search-index2 etc.
These indexes get populated from the indexForDataDump using technique 
mentioned here 
<http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/>.
So this way I can drop the search index as desired and create new one with 
new mapping.
Any pros/cons or issue with this approach? There will be data duplication 
but  I am hoping its minimum. ( Any way to quantify it?)

regards and thanks
amish

On Tuesday, November 11, 2014 10:02:46 AM UTC-8, Amish Asthana wrote:
>
> I am not aware of FAST but the idea looks promising.
> However it might not be that easy to just have plugin for ES, as the data 
> itself is distributed on different machines.
> So it will not be possible to have just one server with the data, as it 
> will become single point of failure.
> regards and thanks
> amish
>
> On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
>>
>> I know from the FAST Search engine ten years ago there was a two-phase 
>> commit for distributed search and indexing. One server could listen on the 
>> API and keep the (compressed) input stored, and all the other indexing 
>> servers were supplied by this input in another phase to create binary 
>> indexes, either automatically, or by manual operation, called 
>> "suspend/resume indexing API". 
>>
>> The advantage was that data could be received permanently via API while 
>> FAST indexing could be stopped temporarily in order to balance between 
>> indexing and search performance on limited hardware.
>>
>> Do you think of something like that also for Elasticsearch? This 
>> architecture is possible to implement by a plugin.
>>
>> Jörg
>>
>> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana  
>> wrote:
>>
>>> Hi
>>> Is there a way we can decouple data and associated mapping/indexing in 
>>> Elasticsearch itself.
>>> Basically store the raw data as source( json or some other format)  and 
>>> various mapping/index can be used on top of that.
>>> I understand that one can use an outside database or file system, but 
>>> can it be natively achieved in ES itself.
>>>
>>> Basically we are trying to see how our ES instance will work when we 
>>> have to change mapping of existing and continuously incoming data without 
>>> any downtime for the end user.
>>> We have an added wrinkle that our indexing has to be edit aware for 
>>> versioning purpose; unlike ES where each edit is a new record.
>>> regards and thanks
>>> amish
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4be01b3a-2747-4f6e-a1c3-7299e9f83bc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Decoupling Data and indexing

2014-11-11 Thread Amish Asthana
I am not aware of FAST but the idea looks promising.
However it might not be that easy to just have plugin for ES, as the data 
itself is distributed on different machines.
So it will not be possible to have just one server with the data, as it 
will become single point of failure.
regards and thanks
amish

On Tuesday, November 11, 2014 1:21:53 AM UTC-8, Jörg Prante wrote:
>
> I know from the FAST Search engine ten years ago there was a two-phase 
> commit for distributed search and indexing. One server could listen on the 
> API and keep the (compressed) input stored, and all the other indexing 
> servers were supplied by this input in another phase to create binary 
> indexes, either automatically, or by manual operation, called 
> "suspend/resume indexing API". 
>
> The advantage was that data could be received permanently via API while 
> FAST indexing could be stopped temporarily in order to balance between 
> indexing and search performance on limited hardware.
>
> Do you think of something like that also for Elasticsearch? This 
> architecture is possible to implement by a plugin.
>
> Jörg
>
> On Mon, Nov 10, 2014 at 10:13 PM, Amish Asthana  > wrote:
>
>> Hi
>> Is there a way we can decouple data and associated mapping/indexing in 
>> Elasticsearch itself.
>> Basically store the raw data as source( json or some other format)  and 
>> various mapping/index can be used on top of that.
>> I understand that one can use an outside database or file system, but can 
>> it be natively achieved in ES itself.
>>
>> Basically we are trying to see how our ES instance will work when we have 
>> to change mapping of existing and continuously incoming data without any 
>> downtime for the end user.
>> We have an added wrinkle that our indexing has to be edit aware for 
>> versioning purpose; unlike ES where each edit is a new record.
>> regards and thanks
>> amish
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8221cb51-a44e-4450-a9f5-7240681fab6c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
No I am not saying that . I am saying this :
GET  my_index_v1/mytype/_search
{
  "query": {
"query_string": {
  "default_field": "name",
  "query": "welcome-doesnotmatchanything",
  "default_operator": "AND"
}
  }
}

Here I will not get a match as expected. If I do not specify then OR is the 
deafult operator and it will match.
amish


On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:
>
> My default operator doesn't matter if I understand it correctly, because 
> I'm specifying the operate explicitly. Also, I can reproduce this behavior 
> using a single search term, so there's no operator to speak of. Unless 
> you're  saying that the default operator applies to a single term query if 
> it is broken into tokens?
>  
>
>> Note that using the welcome-doesnotmatchanything analzyzer will break 
>> into two tokens with OR and your document will match unless you use AND
>
>
> This concerns me... my search looks like:
>
> message:welcome-doesnotmatchanything
>
> I cannot break that into an AND. The entire thing is a value provided by 
> the end user. You're saying I should on the app side break the string they 
> entered into tokens and join them with ANDs? That doesn't seem viable...
>
> Let me back up and say what I'm expecting the user to be able to do. 
> There's a single text box where they can enter a search query, with the 
> following rules:
> 1. The user may use a trailing wildcard, e.g. foo*
> 2. The user may enter multiple terms separated by a space. Only documents 
> containing all of the terms will match.
> 3. The user might enter special characters, such as in "battle-axe", 
> simply because that is what they think they should search for, which should 
> match documents containing "battle" and "axe" (the same as a search for 
> "battle axe").
>
> To that end, I am taking their search string and forming a search like 
> this:
>
> message: AND...
>
> Where the string is split on spaces and joined with the AND clauses. For 
> each individual part of the search phrase, I take care of escaping special 
> characters (except "*" since I am allowing them to use wildcards). For 
> example, if they entered "foo bar!", I would generate this query:
>
> message:foo AND message:bar\!
>
> The problem is they are entering "battle-axe", causing me to generate this:
>
> message:battle\-axe
>
> But that ends up being the same as:
>
> (message:battle OR message:axe)
>
> I guess that is what I was not expecting. Because of this behavior, I have 
> to know from my app point of view what tokens I should be splitting the 
> original string on, so that I can join them back together with ANDs. But 
> that means basically reimplementing the tokenizer on my end, does it not? 
> There must be a better way? Like specifying I want those terms to be joined 
> with ANDs instead?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b20d4b80-2ebd-4b5c-a1e5-a434c2d68598%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
I created a test index using your pattern and I am seeing the appropriate 
behaviour.
I am assuming you are using the same analyzer for search/query as well as 
ensuring that your DEFAULT OPERATOR is AND.
Note that using the welcome-doesnotmatchanything analzyzer will break into 
two tokens with OR and your document will match unless you use AND.
amish

On Monday, November 10, 2014 2:48:06 PM UTC-8, Dave Reed wrote:
>
> Also interesting... if I run the query with explain=true, I see 
> information in the details about the "welcome" token, but there's no 
> mention at all about the "doesnotmatch" token. I guess it wouldn't mention 
> it though, since if it did, the document shouldn't match in the first place.
>
> On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:
>>
>> Yes of course :) Here we go:
>>
>> {
>>
>>- valid: true
>>- _shards: {
>>   - total: 1
>>   - successful: 1
>>   - failed: 0
>>}
>>- explanations: [
>>   - {
>>  - index: index_v1
>>  - valid: true
>>  - explanation: message:welcome message:doesnotmatch
>>   }
>>]
>>
>> }
>>
>> It pasted a little weird but that's it.
>>
>>
>>
>> On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:
>>>
>>> Can you run the validate query output. That will be helpful.
>>> amish
>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6f17d388-83c9-4d75-8f6f-8af3b4dc954b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query_string search containing a dash has unexpected results

2014-11-10 Thread Amish Asthana
Can you run the validate query output. That will be helpful.
amish

On Thursday, November 6, 2014 4:47:12 PM UTC-8, Dave Reed wrote:
>
> I have a document with a field "message", that contains the following text 
> (truncated):
>
> Welcome to test.com!
>
> The assertion field is mapped to have an analyzer that breaks that string 
> into the following tokens:
>
> welcome
> to
> test
> com
>
> But, when I search with a query like this:
>
> {
>   "query": {
>
> "query_string": {
>   "query": "id:3955974 AND message:welcome-doesnotmatchanything"
> }
>   }
> }
>
>
>
> To my surprise, it finds the document (3955974 is the document id). The 
> dash and everything after it seems to be ignored, because it does not 
> matter what I put there, it will still match the document.
>
> I've tried escaping it:
>
> {
>   "query": {
> "query_string": {
>   "query": "id:3955974 AND message:welcome\\-doesnotmatchanything"
> }
>   }
> }
> (note the double escape since it has to be escaped for the JSON too)
>
> But that makes no difference. I still get 1 matching document. If I put it 
> in quotes it works:
>
> {
>   "query": {
> "query_string": {
>   "query": "id:3955974 AND message:\"welcome-doesnotmatchanything\""
> }
>   }
> }
>
> It works, meaning it matches 0 documents, since that document does not 
> contain the "doesnotmatchanything" token. That's great, but I don't 
> understand why the unquoted version does not work. This query is being 
> generated so I can't easily just decide to start quoting it, and I can't 
> always do that anyway since the user is sometimes going to use wildcards, 
> which can't be quoted if I want them to function. I was under the 
> assumption that an EscapedUnquotedString is the same as a quoted unespaced 
> string (in other words, foo:a\b\c === foo:"abc", assuming all special 
> characters are escaped in the unquoted version).
>
> I'm only on ES 1.01, but I don't see anything new or changes that would 
> have impacted this behavior in later versions.
>
> Any insights would be helpful! :)
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7790c6fc-5578-4434-9bd2-fd846e59a997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Decoupling Data and indexing

2014-11-10 Thread Amish Asthana
Hi
Is there a way we can decouple data and associated mapping/indexing in 
Elasticsearch itself.
Basically store the raw data as source( json or some other format)  and 
various mapping/index can be used on top of that.
I understand that one can use an outside database or file system, but can 
it be natively achieved in ES itself.

Basically we are trying to see how our ES instance will work when we have 
to change mapping of existing and continuously incoming data without any 
downtime for the end user.
We have an added wrinkle that our indexing has to be edit aware for 
versioning purpose; unlike ES where each edit is a new record.
regards and thanks
amish

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0bb1f5ef-3991-4568-9891-018baf79ebae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Boosting a list of field for queries

2014-09-22 Thread Amish Asthana
Not seen any reply.
Can anyone guide us.
Let me ask in another way :
a)  Lets say i have document with 10 fields. 
Assume that there is no boosting etc. required for any field. Now lets say 
we have "_all" fields encompassing all these fields.
Question : Is the query on just "_all" field faster then field by field 
query?
If so by what factor? ( To me logically it should tokens in "_all" field 
should be less than equal to summation of all fields and thus time should 
be less than field by field, approaching _all case if all are distinct 
token.)

b) Then the second part of question is how to boost a field; if its getting 
copied to a copy to field.
regards and thanks
amish

On Tuesday, September 9, 2014 10:50:21 AM UTC-7, Amish Asthana wrote:
>
> Hi Folks
> We have a bunch of fields for a document. 
> Lets call them "field1",'field2"., "FIELD1","FIELD2"..
>
> When we search for a particular text we want to see those records first 
> which has that text in "FIELD1"..."FIELDn" before records from "field1", 
> "field2" etc.
>
> Now the complications :
> a) We have lots of field. Doing internal testing we found that if we copy 
> the fields to a common field(using "copy_to" something akin to _all field) 
> and do search only that field then the search is much faster then doing a 
> multi match query on list of fields. Not found any document on this, but 
> this seem to be the case in all our internal testing.
> b) We have our own analyzer for different copy to field. So lets say I 
> have "my_own_analyzed_all_field1" on which I am searching. All values from 
> "field1". "FIELDn" is copied here.
> c) The "boost" parameter for field as far I see work for field type( As 
> far as I see its only deprecated for document ). However this only works if 
> the result is going to "_all' field. It does not seem to be boosting the 
> field getting copied to "my_own_analyzed_all_field1".
>
> So the question is can it be done here? Or do we have to do multi match 
> query with each field boosted by certain factor.
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf0b5392-e463-4f79-8218-ebff8167f7b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Boosting a list of field for queries

2014-09-09 Thread Amish Asthana
Hi Folks
I have created test data like this in Sense.
PUT /myccindexallanalyzed/testobject/_bulk
{"index":{"_id":1}}
{"name":"do not","description":"single test word"}
{"index":{"_id":2}}
{"name":"single test word","description":"do not"}

I am trying to search for "sin* test" and want the document which has this 
phrase in Name field to be first.
I do get the document but the one in the description field has higher 
score. What am I doing wrong here?
Here is the query :

GET myccindexallanalyzed/testobject/_search?explain
{
  "query": {
"bool": {
  "should" : [
  {
"span_near":
{"clauses":
   [
 
{"span_multi":{"match":{"wildcard":{"description":{"wildcard":"sin*","boost":1.0}
   
,{"span_multi":{"match":{"wildcard":{"description":{"wildcard":"test","boost":1.0}
   ] 
   ,"slop":0   ,"in_order":true,"collect_payloads":false
}
  },
  {
"span_near":
{"clauses":
   [
 
{"span_multi":{"match":{"wildcard":{"name":{"wildcard":"sin*","boost":2.0}  
 
,{"span_multi":{"match":{"wildcard":{"name":{"wildcard":"test","boost":2.0}
   ] 
   ,"slop":0   ,"in_order":true,"collect_payloads":false
}
  }
  ]
  ,"minimum_should_match" : 1
}
  }
}

On Tuesday, September 9, 2014 10:50:21 AM UTC-7, Amish Asthana wrote:
>
> Hi Folks
> We have a bunch of fields for a document. 
> Lets call them "field1",'field2"., "FIELD1","FIELD2"..
>
> When we search for a particular text we want to see those records first 
> which has that text in "FIELD1"..."FIELDn" before records from "field1", 
> "field2" etc.
>
> Now the complications :
> a) We have lots of field. Doing internal testing we found that if we copy 
> the fields to a common field(using "copy_to" something akin to _all field) 
> and do search only that field then the search is much faster then doing a 
> multi match query on list of fields. Not found any document on this, but 
> this seem to be the case in all our internal testing.
> b) We have our own analyzer for different copy to field. So lets say I 
> have "my_own_analyzed_all_field1" on which I am searching. All values from 
> "field1". "FIELDn" is copied here.
> c) The "boost" parameter for field as far I see work for field type( As 
> far as I see its only deprecated for document ). However this only works if 
> the result is going to "_all' field. It does not seem to be boosting the 
> field getting copied to "my_own_analyzed_all_field1".
>
> So the question is can it be done here? Or do we have to do multi match 
> query with each field boosted by certain factor.
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8eac1aa8-8b06-44c3-b4d0-20965160e771%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Phrase wildcard search

2014-09-09 Thread Amish Asthana
My quey is something like this :

"query":
  {
"span_near":
{"clauses":
   [
 
{"span_multi":{"match":{"wildcard":{"name":{"wildcard":"comp*"} 

,{"span_multi":{"match":{"wildcard":{"name":{"wildcard":"engaged"}
   ] 
   ,"slop":0   ,"in_order":true,"collect_payloads":false
}
  }

here I am searching for "comp* engaged" as a phrase wildcard in name field. 
If I have multiple fields how do I do it? If I have to boost some field is 
there a way?
 

On Tuesday, September 9, 2014 11:02:27 AM UTC-7, Amish Asthana wrote:
>
> We have a requirement to phrase wildcard search in elastic search.
> The requirement is to search for lets say search for "Barce* Me*i" and it 
> should give any document which has "Barcelona Messi" in a phrase in ANY 
> field.
> We have been able to do it using span near with slop 0, and it works fine .
> The issue is that it works only on a particular field of the document.
> The question is how can we make it work for any field in the document?
>
> We tried to create a SPAN near query for each field and then encompass the 
> whole in a boolean query but it does not work.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b9a4acd5-a484-4e47-b036-576c36f789d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Phrase wildcard search

2014-09-09 Thread Amish Asthana
We have a requirement to phrase wildcard search in elastic search.
The requirement is to search for lets say search for "Barce* Me*i" and it 
should give any document which has "Barcelona Messi" in a phrase in ANY 
field.
We have been able to do it using span near with slop 0, and it works fine .
The issue is that it works only on a particular field of the document.
The question is how can we make it work for any field in the document?

We tried to create a SPAN near query for each field and then encompass the 
whole in a boolean query but it does not work.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4093f991-fa00-4fb2-bcf6-8cdd9a59f884%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Boosting a list of field for queries

2014-09-09 Thread Amish Asthana
Hi Folks
We have a bunch of fields for a document. 
Lets call them "field1",'field2"., "FIELD1","FIELD2"..

When we search for a particular text we want to see those records first 
which has that text in "FIELD1"..."FIELDn" before records from "field1", 
"field2" etc.

Now the complications :
a) We have lots of field. Doing internal testing we found that if we copy 
the fields to a common field(using "copy_to" something akin to _all field) 
and do search only that field then the search is much faster then doing a 
multi match query on list of fields. Not found any document on this, but 
this seem to be the case in all our internal testing.
b) We have our own analyzer for different copy to field. So lets say I have 
"my_own_analyzed_all_field1" on which I am searching. All values from 
"field1". "FIELDn" is copied here.
c) The "boost" parameter for field as far I see work for field type( As far 
as I see its only deprecated for document ). However this only works if the 
result is going to "_all' field. It does not seem to be boosting the field 
getting copied to "my_own_analyzed_all_field1".

So the question is can it be done here? Or do we have to do multi match 
query with each field boosted by certain factor.



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/826be7eb-766a-400c-9198-c47771a385c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.