Folding of accented to non-accented *only* — leaving symbols

2014-10-13 Thread Lee Gee
I now the asciifolding filter docs are really very clear on this, but it 
took me an embarrassingly long time to realise I was losing my currency 
symbol (£) to the ASCII folding filter.

Other than creating my own character map with the char map filter, does 
there exist something of production quality that would translate accented 
UTF8 characters of the Latin-alphabet into non-accented characters in the 
ASCII range?

TIA
Lee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ff95c6ec-7907-454e-bd58-774ee173f4e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Pattern replace apostrophes?

2014-10-09 Thread Lee Gee
The problem was that it was not an apostrophe, but an opening single quote. 

Have increased editor font size to address this issue.

On Tuesday, October 7, 2014 8:00:13 PM UTC+1, Ivan Brusic wrote:

 What type of query are you using? Perhaps the query you are using is not 
 using the same analyzer at search time.

 -- 
 Ivan

 On Tue, Oct 7, 2014 at 6:06 AM, Lee Gee lee...@gmail.com javascript: 
 wrote:

 My users have issues with apostrophes: I need to index and search aaa's 
 as it is, and without the apostrophe, as aaas.

 If I use a char_filter to remove apostrophes when indexing and when 
 searching, the _analyze endpoint shows me that they produce 'words' without 
 apostrophes like this (respectively):

   {...   {
   end_offset = 5,
   position = 1,
   start_offset = 0,
   token = aaas,
   type = word,
   }  }

   {
   end_offset = 5,
   position = 1,
   start_offset = 0,
   token = aaas,
   type = word,
 },

 But there seems to be nothing I can do to find aaas / aaa's when 
 searching!

 Is this expected?  

 TIA
 Lee

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a959fe9f-6899-47fd-a371-131c1e51071c%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/a959fe9f-6899-47fd-a371-131c1e51071c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8e1d1c9-cc1e-49e7-88b8-e767dd2fac08%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Pattern replace apostrophes?

2014-10-08 Thread Lee Gee
The index uses the keyword tokenizer, with edge-ngram (and other) filters — 
it only wants to match from the start of the string, for autocomplete.

The search analyser is also keyword, with various filters.

The pattern-replace filter for apostrophes is applied to both.

On Tuesday, October 7, 2014 8:00:13 PM UTC+1, Ivan Brusic wrote:

 What type of query are you using? Perhaps the query you are using is not 
 using the same analyzer at search time.

 -- 
 Ivan

 On Tue, Oct 7, 2014 at 6:06 AM, Lee Gee lee...@gmail.com javascript: 
 wrote:

 My users have issues with apostrophes: I need to index and search aaa's 
 as it is, and without the apostrophe, as aaas.

 If I use a char_filter to remove apostrophes when indexing and when 
 searching, the _analyze endpoint shows me that they produce 'words' without 
 apostrophes like this (respectively):

   {...   {
   end_offset = 5,
   position = 1,
   start_offset = 0,
   token = aaas,
   type = word,
   }  }

   {
   end_offset = 5,
   position = 1,
   start_offset = 0,
   token = aaas,
   type = word,
 },

 But there seems to be nothing I can do to find aaas / aaa's when 
 searching!

 Is this expected?  

 TIA
 Lee




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3c5fbb3b-decc-41f6-8be6-2a1f5f37f4be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Ampersand synonym in YAML?

2014-10-08 Thread Lee Gee
  name_synonyms:
  type: synonym
  synonyms:
- 1,one
# - ,and,+=and
- ' = and'
  
How can I use YAML to correctly configure a synonym for ampersands and the 
'plus' symbol and the word 'and'?

The above synonym for 1/one seems to work.

Thanks
Lee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4557c3f-8c90-421a-a841-797135c34d86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Pattern replace apostrophes?

2014-10-07 Thread Lee Gee
My users have issues with apostrophes: I need to index and search aaa's 
as it is, and without the apostrophe, as aaas.

If I use a char_filter to remove apostrophes when indexing and when 
searching, the _analyze endpoint shows me that they produce 'words' without 
apostrophes like this (respectively):

  {...   {
  end_offset = 5,
  position = 1,
  start_offset = 0,
  token = aaas,
  type = word,
  }  }

  {
  end_offset = 5,
  position = 1,
  start_offset = 0,
  token = aaas,
  type = word,
},

But there seems to be nothing I can do to find aaas / aaa's when 
searching!

Is this expected?  

TIA
Lee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a959fe9f-6899-47fd-a371-131c1e51071c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: edge_ngram results

2014-10-02 Thread Lee Gee
'explain' shows only two differences between the two results:

Hit on 'S' vs. hit on 'DqWjDCcsh S'

* idf(docFreq=1, maxDocs=1) vs. idf(docFreq=10, maxDocs=10)

* fieldNorm(doc=0) vs. fieldNorm(doc=9)

My possibly flawed understanding is that IDF is the inverse document 
frequency of the search term across the whole index — what confuses me is 
that these are results for the same term in the same index, so shouldn't 
the IDF be the same...?

tia
lee

On Wednesday, October 1, 2014 11:24:17 AM UTC+1, Lee Gee wrote:

 I have an ElasticSearch string field configured for autocomplete like this:

 autocomplete_analyzer:
   type: custom
   tokenizer: whitespace
   filter: [ lowercase, asciifolding, ending_synonym, 
 name_synonyms, autocomplete_filter ]

 autocomplete_filter:
   type: edge_ngram
   min_gram: 1
   max_gram: 20
   token_chars: [ letter, digit, whitespace, punctuation, symbol ]

 search_analyzer:
   type: custom
   tokenizer: whitespace
   filter: [ lowercase, asciifolding, standard, name_synonyms, 
 ending_synonym ]



 I have a record where the field contains 'S XYZ', and lots of other 
 records where the field contains other words beginning S.

 I do not understand why, when I search for 'S XYZ', it is not the first 
 result.

 Could someone please explain ?

 Many thanks in anticipation
 lee



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/681ebe12-7cfa-4ed6-a045-ad287545d4eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: edge_ngram results

2014-10-02 Thread Lee Gee
The problem was that my test script did not pause between 
creating/populating the index, and searching on it. Even though there are 
very few documents (10), ElasticSearch still needs a second or two to catch 
its breath and mop its brow before it is ready to search.

Now to find a way to rank shorter strings higher than longer ones but 
that's another question

thanks
Lee

On Wednesday, October 1, 2014 11:24:17 AM UTC+1, Lee Gee wrote:

 I have an ElasticSearch string field configured for autocomplete like this:

 autocomplete_analyzer:
   type: custom
   tokenizer: whitespace
   filter: [ lowercase, asciifolding, ending_synonym, 
 name_synonyms, autocomplete_filter ]

 autocomplete_filter:
   type: edge_ngram
   min_gram: 1
   max_gram: 20
   token_chars: [ letter, digit, whitespace, punctuation, symbol ]

 search_analyzer:
   type: custom
   tokenizer: whitespace
   filter: [ lowercase, asciifolding, standard, name_synonyms, 
 ending_synonym ]



 I have a record where the field contains 'S XYZ', and lots of other 
 records where the field contains other words beginning S.

 I do not understand why, when I search for 'S XYZ', it is not the first 
 result.

 Could someone please explain ?

 Many thanks in anticipation
 lee



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c43961cb-224a-4b17-a03e-fc44926a05ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Sorting equal scores by field length?

2014-10-02 Thread Lee Gee
Is it possible to sort equally-scored results by the length of the field?

Or am I doing something else incorrectly?

With an edge_ngram filter on a keyword field, with search term S, I see 
SUPER comes before S in my results.

As a last resort, I could add a field to reflect the length of the keyword 
field, or even turn on dynamic scripting, but I imagine i am missing 
something vital

Thanks
Lee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3de35adb-4573-4322-b2c0-c1e49320102d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


edge_ngram results

2014-10-01 Thread Lee Gee
I have an ElasticSearch string field configured for autocomplete like this:

autocomplete_analyzer:
  type: custom
  tokenizer: whitespace
  filter: [ lowercase, asciifolding, ending_synonym, name_synonyms, 
autocomplete_filter ]

autocomplete_filter:
  type: edge_ngram
  min_gram: 1
  max_gram: 20
  token_chars: [ letter, digit, whitespace, punctuation, symbol ]

search_analyzer:
  type: custom
  tokenizer: whitespace
  filter: [ lowercase, asciifolding, standard, name_synonyms, 
ending_synonym ]



I have a record where the field contains 'S XYZ', and lots of other records 
where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first 
result.

Could someone please explain ?

Many thanks in anticipation
lee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/218280b1-2c9c-42db-854d-62d1c8de8862%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Facted navigation with totals, drilling down

2014-09-07 Thread Lee Gee
I have two 'types' in an index, or two indices of different types (I'd 
prefer the latter but can live with the former).

I'm running an aggregation by type to implement what my UX people refer to 
as faceted search — which makes Googling for ES  help quite tricky. 

UX would like to filter by type but retain a count for total hits in each 
aggregation bucket — it the total number of each type of record that 
matches the query.

Can this be done in one query?

Failing that, can two queries be supplied/run in parallel?

Thanks
Lee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8a815ff7-3fb0-480f-ab4a-0786577d4cb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Weight be position in string?

2014-08-27 Thread Lee Gee
Is it possible to boost the weight a result if it is closer to the start of 
a string in the index?  So that searching for 'bar' would weight 'foo bar 
baz' higher than 'foo baz bar'?

I'm working with ngrams, if that helps.

Thanks
Lee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52bf1a41-b01d-4e3a-a61d-ab364afe2869%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Swap indexes?

2014-08-26 Thread Lee Gee
I was looking for the index alias, thanks all.

On Tuesday, June 17, 2014 9:31:00 AM UTC+1, Lee Gee wrote:

 Is it possible to have one ES instance create an index and then have a 
 second instance use that created index, without downtime?

 tia
 lee


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c577a018-fe46-4b73-a08c-ea07796fa02d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: _suggest suggestion/question

2014-08-26 Thread Lee Gee
Thank you, Vineeth.

On Sunday, August 17, 2014 12:04:20 PM UTC+1, vineeth mohan wrote:

 Hello Lee ,

 You will need to use context suggester for this purpose - 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/suggester-context.html

 Also this difference stems from the fact that , both actual data and auto 
 completion data are stored in different data structures.
 This is to make sure that the auto completion data is memory resident and 
 thus super fast.

 Thanks
   Vineeth


 On Sun, Aug 17, 2014 at 3:32 PM, Lee Gee lee...@gmail.com javascript: 
 wrote:

 My reading, which may not be accurate, of this [1] clear and concise 
 post, 
 is that it is not possible to use a reference to an existing field as an 
 argument to a suggestor's 'input' or 'payload' fields.

 Please would you clarify if I have missed something?

 If I was correct, would it be much work to add these features?

 TIA
 Lee

 [1] http://www.elasticsearch.org/blog/you-complete-me/

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/2367a474-f47b-43ae-bad0-7326256dec60%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/2367a474-f47b-43ae-bad0-7326256dec60%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9ea51925-5ef8-48f3-8960-e5462e112713%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


_suggest suggestion/question

2014-08-17 Thread Lee Gee
My reading, which may not be accurate, of this [1] clear and concise post, 
is that it is not possible to use a reference to an existing field as an 
argument to a suggestor's 'input' or 'payload' fields.

Please would you clarify if I have missed something?

If I was correct, would it be much work to add these features?

TIA
Lee

[1] http://www.elasticsearch.org/blog/you-complete-me/

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2367a474-f47b-43ae-bad0-7326256dec60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Support for Anchoring in Elasticsearch Regex

2014-07-29 Thread Lee Gee
Lucene and Elastic Search both anchor regexp by default.

Lucene’s patterns are always anchored. The pattern provided must match the 
entire string. 

— 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax


On Wednesday, December 18, 2013 7:19:48 AM UTC, Vaidik Kapoor wrote:

 Hi Folks,

 I see that Elasticsearch supports Regex. But that is limited to Lucene's 
 Regex Engine which does not support anchoring i.e. the entire string will 
 always be anchored. This works as long as you have fixed regular 
 expressions to run, but in cases where the regex query is taken from the 
 user, this becomes very limiting.

 Is there an alternative regex engine for Elasticsearch that at least 
 supports $ and ^ for anchoring? Quick Google and Github search did not get 
 me anything. If not, then is anybody doing something similar or have a work 
 around? One possible solution that I can think of is converting user's 
 entered regex to Lucene compatible regex. But that gets really complex to 
 do correctly with all the grouping and alternation in regex.

 I don't want the entire Perl regex kind of support. Just the anchoring bit 
 is important. Has anybody tried to solve this problem before?

 Thanks,
 Vaidik Kapoor
 vaidikkapoor.info
  

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/27a0c79c-94bc-4878-b355-dd4895bc4135%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Swap indexes?

2014-06-17 Thread Lee Gee
Is it possible to have one ES instance create an index and then have a 
second instance use that created index, without downtime?

tia
lee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9fe7a9eb-11dc-4092-8ec4-e5fc11eaebba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.