Re: How can I make this search requirement work?

2014-07-16 Thread vineeth mohan
Hello Mooky ,

Elasticsearch is not any domain specific and hence wont take out these
financial terms.
You will need to write your own analyzer to facilitate this function.

Thanks
   Vineeth


On Wed, Jul 16, 2014 at 4:17 PM, mooky  wrote:

> And it works a treat. Thanks.
>
> It leads me to think that it would be very useful to use with a series of
> specialist (special-case) analyzers in conjunction with the standard
> analyzer.
>
> Back to my original example - "0# (99.995%)" - what I really want is
> something that will extract "99.995%".
> The standard analyzer will extract "99.995" (and the rest of the text),
> the whitespace analyzer will extract "(99.995%)".
>
> Does a financial/numeric/accounting analyzer already exist? ie Something
> that extracts "99.995%" or "$44.5665" or "-45bps" ?
>
> -M
>
>
>
>
>
>
> On Tuesday, 15 July 2014 18:58:46 UTC+1, mooky wrote:
>>
>> Thanks. That looks interesting!
>>
>>
>> On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
>>>
>>> Hello Mooky ,
>>>
>>> You can apply multiple analyzers to a field -https://github.com/yakaz/
>>> elasticsearch-analysis-combo/
>>>
>>> So you can add all your analyzer here and apply it.
>>>
>>> Thanks
>>>   Vineeth
>>>
>>>
>>> On Tue, Jul 15, 2014 at 8:10 PM, mooky  wrote:
>>>
 I have a bit of an odd requirement in so far as analyzer is concerned.
 Wondering if anyone has any tips/suggestions.
 I have an item I am indexing (grade) that has a property (name) whose
 value can be "0# (99.995%)".
 I am doing a prefix search on _all.
 I want users to be able to search using 99 or 99.9 or 99.995 or
 99.995%.
 I also want the user to be able to copy-paste "0# (99.995%)" and it
 should work.

 I am currently using the whitespace analyzer - which works for many of
 my cases except the tricky one above.
 99.995 doesnt work.
 But "(99.995" does. Because obviously after whitespace tokenization,
 the token begins with (.
 I could filter out the "(" and ")" characters. But then "0# (99.995%)"
 wont work.
 Does anyone have some different suggestions?

 --
 You received this message because you are subscribed to the Google
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%
 40googlegroups.com
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kg7TRG%3DX_%2B7tAueFaZ8pUYXbHrJhFZMVQaYcQyTicenQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I make this search requirement work?

2014-07-16 Thread smonasco
A little late to the party but I would have used a custom index analyzer with 
lowercase, pattern, edgengram and a search analyzer of lowercase, pattern  
(maybe you have to flip lowercase and pattern)

With the pattern tokenizer you can specify a regex.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/693ed0c3-2998-4da4-b30a-c7bf9f311770%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I make this search requirement work?

2014-07-16 Thread mooky
And it works a treat. Thanks.

It leads me to think that it would be very useful to use with a series of 
specialist (special-case) analyzers in conjunction with the standard 
analyzer.

Back to my original example - "0# (99.995%)" - what I really want is 
something that will extract "99.995%".
The standard analyzer will extract "99.995" (and the rest of the text), the 
whitespace analyzer will extract "(99.995%)".

Does a financial/numeric/accounting analyzer already exist? ie Something 
that extracts "99.995%" or "$44.5665" or "-45bps" ?

-M






On Tuesday, 15 July 2014 18:58:46 UTC+1, mooky wrote:
>
> Thanks. That looks interesting!
>
>
> On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
>>
>> Hello Mooky , 
>>
>> You can apply multiple analyzers to a field -
>> https://github.com/yakaz/elasticsearch-analysis-combo/
>>
>> So you can add all your analyzer here and apply it.
>>
>> Thanks
>>   Vineeth
>>
>>
>> On Tue, Jul 15, 2014 at 8:10 PM, mooky  wrote:
>>
>>> I have a bit of an odd requirement in so far as analyzer is concerned. 
>>> Wondering if anyone has any tips/suggestions. 
>>> I have an item I am indexing (grade) that has a property (name) whose 
>>> value can be "0# (99.995%)". 
>>> I am doing a prefix search on _all.
>>> I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
>>> I also want the user to be able to copy-paste "0# (99.995%)" and it 
>>> should work.
>>>
>>> I am currently using the whitespace analyzer - which works for many of 
>>> my cases except the tricky one above.
>>> 99.995 doesnt work.
>>> But "(99.995" does. Because obviously after whitespace tokenization, the 
>>> token begins with (.
>>> I could filter out the "(" and ")" characters. But then "0# (99.995%)" 
>>> wont work.
>>> Does anyone have some different suggestions?
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
>>>  
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/78a267ff-869e-462d-80c4-057c907e0324%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I make this search requirement work?

2014-07-15 Thread mooky
I think I can probably use a combo of the whitespace* and standard 
analyzers.

My current analyzer settings are :

{

"analysis": {
"analyzer": {
"default_index": {
"tokenizer": "whitespace",
"filter": ["lowercase"]
},
"default_search": {
"tokenizer": "whitespace",
"filter": ["lowercase"]
}
}
}
}


-M


On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
>
> Hello Mooky , 
>
> You can apply multiple analyzers to a field -
> https://github.com/yakaz/elasticsearch-analysis-combo/
>
> So you can add all your analyzer here and apply it.
>
> Thanks
>   Vineeth
>
>
> On Tue, Jul 15, 2014 at 8:10 PM, mooky 
> > wrote:
>
>> I have a bit of an odd requirement in so far as analyzer is concerned. 
>> Wondering if anyone has any tips/suggestions. 
>> I have an item I am indexing (grade) that has a property (name) whose 
>> value can be "0# (99.995%)". 
>> I am doing a prefix search on _all.
>> I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
>> I also want the user to be able to copy-paste "0# (99.995%)" and it 
>> should work.
>>
>> I am currently using the whitespace analyzer - which works for many of my 
>> cases except the tricky one above.
>> 99.995 doesnt work.
>> But "(99.995" does. Because obviously after whitespace tokenization, the 
>> token begins with (.
>> I could filter out the "(" and ")" characters. But then "0# (99.995%)" 
>> wont work.
>> Does anyone have some different suggestions?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1f3177ef-020f-4263-bae4-ced1870567e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I make this search requirement work?

2014-07-15 Thread mooky
Thanks. That looks interesting!


On Tuesday, 15 July 2014 16:15:23 UTC+1, vineeth mohan wrote:
>
> Hello Mooky , 
>
> You can apply multiple analyzers to a field -
> https://github.com/yakaz/elasticsearch-analysis-combo/
>
> So you can add all your analyzer here and apply it.
>
> Thanks
>   Vineeth
>
>
> On Tue, Jul 15, 2014 at 8:10 PM, mooky 
> > wrote:
>
>> I have a bit of an odd requirement in so far as analyzer is concerned. 
>> Wondering if anyone has any tips/suggestions. 
>> I have an item I am indexing (grade) that has a property (name) whose 
>> value can be "0# (99.995%)". 
>> I am doing a prefix search on _all.
>> I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
>> I also want the user to be able to copy-paste "0# (99.995%)" and it 
>> should work.
>>
>> I am currently using the whitespace analyzer - which works for many of my 
>> cases except the tricky one above.
>> 99.995 doesnt work.
>> But "(99.995" does. Because obviously after whitespace tokenization, the 
>> token begins with (.
>> I could filter out the "(" and ")" characters. But then "0# (99.995%)" 
>> wont work.
>> Does anyone have some different suggestions?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4e1a9a56-c504-4bc3-b59f-aed6e0226796%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I make this search requirement work?

2014-07-15 Thread Glen Smith
I would start by suggesting that you create an indexing/querying analyzer 
specifically for the field you know has this format.

Otherwise, I think your likeliest path to success, I think, is somewhere in 
the character filters domain.
Character filters are applied to the string before the tokenizer:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html

One possibility here is a pattern replace char filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html

If you can write a matching pattern for all of the allowed values of this 
field, and replace them with just the number,
apply that pattern to your indexing and searching, then you are only 
dealing with searching for the numbers.

You may need a different character filter for the search analyzer, though, 
since you are allowing for more formats than
are found in the source document field.



On Tuesday, July 15, 2014 10:40:30 AM UTC-4, mooky wrote:
>
> I have a bit of an odd requirement in so far as analyzer is concerned. 
> Wondering if anyone has any tips/suggestions. 
> I have an item I am indexing (grade) that has a property (name) whose 
> value can be "0# (99.995%)". 
> I am doing a prefix search on _all.
> I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
> I also want the user to be able to copy-paste "0# (99.995%)" and it should 
> work.
>
> I am currently using the whitespace analyzer - which works for many of my 
> cases except the tricky one above.
> 99.995 doesnt work.
> But "(99.995" does. Because obviously after whitespace tokenization, the 
> token begins with (.
> I could filter out the "(" and ")" characters. But then "0# (99.995%)" 
> wont work.
> Does anyone have some different suggestions?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/805c3115-be4f-4ea5-a0d0-0153f9216043%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How can I make this search requirement work?

2014-07-15 Thread vineeth mohan
Hello Mooky ,

You can apply multiple analyzers to a field -
https://github.com/yakaz/elasticsearch-analysis-combo/

So you can add all your analyzer here and apply it.

Thanks
  Vineeth


On Tue, Jul 15, 2014 at 8:10 PM, mooky  wrote:

> I have a bit of an odd requirement in so far as analyzer is concerned.
> Wondering if anyone has any tips/suggestions.
> I have an item I am indexing (grade) that has a property (name) whose
> value can be "0# (99.995%)".
> I am doing a prefix search on _all.
> I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%.
> I also want the user to be able to copy-paste "0# (99.995%)" and it should
> work.
>
> I am currently using the whitespace analyzer - which works for many of my
> cases except the tricky one above.
> 99.995 doesnt work.
> But "(99.995" does. Because obviously after whitespace tokenization, the
> token begins with (.
> I could filter out the "(" and ")" characters. But then "0# (99.995%)"
> wont work.
> Does anyone have some different suggestions?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5mdRgWFJ8Q3Nwr%2BWh6SLFGtzcCWJg1VVV%2BSbOEhw5ktzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How can I make this search requirement work?

2014-07-15 Thread mooky
I have a bit of an odd requirement in so far as analyzer is concerned. 
Wondering if anyone has any tips/suggestions. 
I have an item I am indexing (grade) that has a property (name) whose value 
can be "0# (99.995%)". 
I am doing a prefix search on _all.
I want users to be able to search using 99 or 99.9 or 99.995 or 99.995%. 
I also want the user to be able to copy-paste "0# (99.995%)" and it should 
work.

I am currently using the whitespace analyzer - which works for many of my 
cases except the tricky one above.
99.995 doesnt work.
But "(99.995" does. Because obviously after whitespace tokenization, the 
token begins with (.
I could filter out the "(" and ")" characters. But then "0# (99.995%)" wont 
work.
Does anyone have some different suggestions?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9813b93a-249d-41a9-be21-12c8ec5d6d23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.