Hi Erik,
I would be disappointed if FULLTEXT indexing engine will be
discontinued, since there are some cases where I think it's still usefull.
Such as your case.
I would create a FULLTEXT index in this way:
create index Vlocation on V (location) FULLTEXT METADATA {
"indexRadix" : true,
"ignoreChars" : "" ,
"separatorChars" : "",
"minWordLength" : 1 ,
"stopWords" : []
}
In particular if you don't use separator chars your string sholn't be
tokenized and the strings will be indexed as follows:
"abc def ghi":
- a
- ab
- abc
- abc
- abc d
- abc de
- abc def
- abc def
- abc def g
- abc def gh
- abc def ghi
Take a look at this script:
create database memory:temp
create property V.location string
create index Vlocation on V (location) FULLTEXT METADATA { "indexRadix" :
true, "ignoreChars" : "" , "separatorChars" : "", "minWordLength" : 1,
"stopWords" : [] }
insert into V SET location = "abc def ghi"
select from index:Vlocation
----+------+-----------+----
# |@CLASS|key |rid
----+------+-----------+----
0 |null |a |#9:0
1 |null |ab |#9:0
2 |null |abc |#9:0
3 |null |abc |#9:0
4 |null |abc d |#9:0
5 |null |abc de |#9:0
6 |null |abc def |#9:0
7 |null |abc def |#9:0
8 |null |abc def g |#9:0
9 |null |abc def gh |#9:0
10 |null |abc def ghi|#9:0
----+------+-----------+----
Probably this is not a real FULLTEXT index, but it it seems what you need.
Cheers,
Riccardo
Il giorno venerdì 23 gennaio 2015 03:00:24 UTC+1, Erik Peterson ha scritto:
>
> Hi Riccardo,
> Thanks for your response. I've worked with fulltext before but the results
> in this case were poor. See below. Also, note that fulltext is expected to
> be removed in next release.
> https://groups.google.com/d/msg/orient-database/yroAgjsFpaI/oaaDAItM8mQJ
>
> select location from geo where location containstext "kansas"
> highland, kansas, united stateswinchester, kansas, united statesmadison,
> kansas, united statesyates center, kansas, united states
>
>
> On Thursday, January 22, 2015 at 11:41:52 AM UTC-7, Riccardo Tasso wrote:
>>
>> You can use the simplest (but good) FULLTEXT index (
>> http://www.orientechnologies.com/docs/2.0/orientdb.wiki/FullTextIndex.html )
>> and use the index prefix to its default value (true). Tune also the
>> minWordLength as you need.
>>
>> Then ask the query: SELECT FROM V WHERE location containsText "kansas"
>>
>> Cheers,
>> Riccardo
>>
>> 2015-01-22 19:26 GMT+01:00 Erik Peterson <[email protected]>:
>>
>>> Thanks for taking a look at this...appears to be a gap in ODB capability
>>> but maybe I'm missing something. To clarify...
>>>
>>> @Jing Yes I think the lucene results are as "designed" but not as
>>> "desired". Note that the desired search is like "kansas%" not "%kansas%"
>>>
>>> @Riccardo
>>> Yes, prefix, "kansas%", startsWith(searchTerm), etc. type search. Yes,
>>> search term is variable length.
>>>
>>> On Thursday, January 22, 2015 at 11:02:52 AM UTC-7, Riccardo Tasso wrote:
>>>>
>>>> Probably Erik has the need of indexing prefixes.
>>>>
>>>> Just a question: does your prefixes have a fixed length or you want to
>>>> be able to perform fast searches on any possible substring of your fields?
>>>>
>>>> Cheers,
>>>> Riccardo
>>>>
>>>> 2015-01-22 17:51 GMT+01:00 Jing Chen <[email protected]>:
>>>>
>>>>> Hi Erik,
>>>>>
>>>>> The Lucene result looks correct to me. Lucene index tokenizes your
>>>>> original string and creates index. so
>>>>>
>>>>> select location from geo where location lucene "kansas*"
>>>>>
>>>>> should be the same as
>>>>>
>>>>> select location from geo where location like "%kansas%"
>>>>>
>>>>> Jing
>>>>>
>>>>>
>>>>> On Thursday, January 22, 2015 at 8:42:19 AM UTC-8, Erik Peterson wrote:
>>>>>>
>>>>>> Apparently OrientDB does not provide a performant "like" search
>>>>>> capability. Is that correct?
>>>>>>
>>>>>> Here's an example.
>>>>>>
>>>>>> *1) Returns desired results but 10x slow*
>>>>>> select from geo where location like "kansas%"
>>>>>>
>>>>>> "kansas, united states"
>>>>>> "kansas city, kansas, united states"
>>>>>> "kansas city, missouri, united states"
>>>>>> "kansas, illinois, united states"
>>>>>>
>>>>>>
>>>>>>
>>>>>> *2) Lucene does not return desired results (for this type of search)*
>>>>>>
>>>>>> select location from geo where location lucene "kansas*"
>>>>>>
>>>>>> "kansas, united states"
>>>>>> "abilene, kansas, united states"
>>>>>> "allen, kansas, united states"
>>>>>> "alma, kansas, united states"
>>>>>>
>>>>>>
>>>>>> On Tuesday, January 20, 2015 at 1:11:59 AM UTC-7, Erik Peterson wrote:
>>>>>>>
>>>>>>> Using 2.0-RC1
>>>>>>> After some experimenting with queries using like, containstext, and
>>>>>>> lucene, I have a search where "select from X where like 'abc%" provides
>>>>>>> the
>>>>>>> best results. However it's slow and like can't use indexes correct? Is
>>>>>>> there another way to emmulate "like" with lucene indexes? (Note that
>>>>>>> "select
>>>>>>> from X where lucene 'abc*" provides very different search behavior from
>>>>>>> the
>>>>>>> simiar "like" query.) Thanks.
>>>>>>>
>>>>>> --
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "OrientDB" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.