Thanks Riccardo! This is a good writeup that would be a great add to the 
indexing docs. It would have been adequate for our needs but I had to move 
on and implement my own in memory approach. Here's how approaches compared:
custom - 17ms
fulltext - 70ms
like - 700ms



On Friday, January 23, 2015 at 12:46:24 AM UTC-7, Riccardo Tasso wrote:
>
> Hi Erik,
>     I would be disappointed if FULLTEXT indexing engine will be 
> discontinued, since there are some cases where I think it's still usefull. 
> Such as your case.
>
> I would create a FULLTEXT index in this way:
>
> create index Vlocation on V (location) FULLTEXT METADATA {
>    "indexRadix" : true,
>    "ignoreChars" : "" ,
>    "separatorChars" : "",
>    "minWordLength" : 1 ,
>    "stopWords" : []
> }
>
>
>
> In particular if you don't use separator chars your string sholn't be 
> tokenized and the strings will be indexed as follows:
> "abc def ghi":
>
>    - a
>    - ab
>    - abc
>    - abc 
>    - abc d
>    - abc de
>    - abc def
>    - abc def 
>    - abc def g
>    - abc def gh
>    - abc def ghi
>
> Take a look at this script:
> create database memory:temp
> create property V.location string
>
> create index Vlocation on V (location) FULLTEXT METADATA { "indexRadix" : 
> true, "ignoreChars" : "" , "separatorChars" : "", "minWordLength" : 1, 
> "stopWords" : [] }
>
> insert into V SET location = "abc def ghi"
>
> select from index:Vlocation
>
> ----+------+-----------+----
> #   |@CLASS|key        |rid 
> ----+------+-----------+----
> 0   |null  |a          |#9:0
> 1   |null  |ab         |#9:0
> 2   |null  |abc        |#9:0
> 3   |null  |abc        |#9:0
> 4   |null  |abc d      |#9:0
> 5   |null  |abc de     |#9:0
> 6   |null  |abc def    |#9:0
> 7   |null  |abc def    |#9:0
> 8   |null  |abc def g  |#9:0
> 9   |null  |abc def gh |#9:0
> 10  |null  |abc def ghi|#9:0
> ----+------+-----------+----
>
>
> Probably this is not a real FULLTEXT index, but it it seems what you need.
>
> Cheers,
>    Riccardo
>
> Il giorno venerdì 23 gennaio 2015 03:00:24 UTC+1, Erik Peterson ha scritto:
>>
>> Hi Riccardo,
>> Thanks for your response. I've worked with fulltext before but the 
>> results in this case were poor. See below. Also, note that fulltext is 
>> expected to be removed in next release. 
>> https://groups.google.com/d/msg/orient-database/yroAgjsFpaI/oaaDAItM8mQJ
>>
>> select location from geo where location containstext "kansas"
>> highland, kansas, united stateswinchester, kansas, united statesmadison, 
>> kansas, united statesyates center, kansas, united states
>>
>>
>> On Thursday, January 22, 2015 at 11:41:52 AM UTC-7, Riccardo Tasso wrote:
>>>
>>> You can use the simplest (but good) FULLTEXT index ( 
>>> http://www.orientechnologies.com/docs/2.0/orientdb.wiki/FullTextIndex.html 
>>> ) 
>>> and use the index prefix to its default value (true). Tune also the 
>>> minWordLength as you need.
>>>
>>> Then ask the query: SELECT FROM V WHERE location containsText "kansas"
>>>
>>> Cheers,
>>>    Riccardo
>>>
>>> 2015-01-22 19:26 GMT+01:00 Erik Peterson <[email protected]>:
>>>
>>>> Thanks for taking a look at this...appears to be a gap in ODB 
>>>> capability but maybe I'm missing something. To clarify...
>>>>
>>>> @Jing Yes I think the lucene results are as "designed" but not as 
>>>> "desired". Note that the desired search is like "kansas%" not "%kansas%"
>>>>
>>>> @Riccardo
>>>> Yes, prefix, "kansas%", startsWith(searchTerm), etc. type search. Yes, 
>>>> search term is variable length.
>>>>
>>>> On Thursday, January 22, 2015 at 11:02:52 AM UTC-7, Riccardo Tasso 
>>>> wrote:
>>>>>
>>>>> Probably Erik has the need of indexing prefixes.
>>>>>
>>>>> Just a question: does your prefixes have a fixed length or you want to 
>>>>> be able to perform fast searches on any possible substring of your fields?
>>>>>
>>>>> Cheers,
>>>>>    Riccardo
>>>>>
>>>>> 2015-01-22 17:51 GMT+01:00 Jing Chen <[email protected]>:
>>>>>
>>>>>> Hi Erik,
>>>>>>
>>>>>> The Lucene result looks correct to me. Lucene index tokenizes your 
>>>>>> original string and creates index. so 
>>>>>>
>>>>>> select location from geo where location lucene "kansas*"
>>>>>>
>>>>>> should be the same as 
>>>>>>
>>>>>> select location from geo where location like "%kansas%"
>>>>>>
>>>>>> Jing
>>>>>>
>>>>>>
>>>>>> On Thursday, January 22, 2015 at 8:42:19 AM UTC-8, Erik Peterson 
>>>>>> wrote:
>>>>>>>
>>>>>>> Apparently OrientDB does not provide a performant "like" search 
>>>>>>> capability. Is that correct?
>>>>>>>
>>>>>>> Here's an example. 
>>>>>>>
>>>>>>> *1) Returns desired results but 10x slow*
>>>>>>> select from geo where location like "kansas%"
>>>>>>>
>>>>>>> "kansas, united states"
>>>>>>> "kansas city, kansas, united states"
>>>>>>> "kansas city, missouri, united states"
>>>>>>> "kansas, illinois, united states"
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *2) Lucene does not return desired results (for this type of search)*
>>>>>>>
>>>>>>> select location from geo where location lucene "kansas*"
>>>>>>>
>>>>>>> "kansas, united states"
>>>>>>> "abilene, kansas, united states"
>>>>>>> "allen, kansas, united states"
>>>>>>> "alma, kansas, united states"
>>>>>>>
>>>>>>>
>>>>>>> On Tuesday, January 20, 2015 at 1:11:59 AM UTC-7, Erik Peterson 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Using 2.0-RC1
>>>>>>>> After some experimenting with queries using like, containstext, and 
>>>>>>>> lucene, I have a search where "select from X where like 'abc%" 
>>>>>>>> provides the 
>>>>>>>> best results. However it's slow and like can't use indexes correct? Is 
>>>>>>>> there another way to emmulate "like" with lucene indexes? (Note that  
>>>>>>>> "select 
>>>>>>>> from X where lucene 'abc*" provides very different search behavior 
>>>>>>>> from the 
>>>>>>>> simiar "like" query.) Thanks.
>>>>>>>>
>>>>>>>  -- 
>>>>>>
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "OrientDB" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to