Re: [sqlite] FTS search negative term syntax

2008-05-01 Thread Scott Hess
On Thu, May 1, 2008 at 1:41 AM, Ralf Junker <[EMAIL PROTECTED]> wrote:
>>Fair warning, though: It's not entirely clear that the fts search
>>syntax should aim to hew too closely to consumer-oriented search
>>syntax.
>
> Interesting point, too. Up to now, I always perceived the FTS search syntax to
> be very much consumer-oriented. It it just too similar to major search engines
> to be primarily machine-oriented.
>
> As it stands now, FTS syntax can of course be machine generated, if that is
> what you are aiming at. I believe that this should remain easy to do. And my
> suggested minus sign modification would not change this, would it?

Right now, things are pretty hybrid.  Long-term, one of the design
inputs I got from interested parties when we were starting the project
is that we probably want to have both a user-oriented syntax and a
machine-oriented syntax.  Then you could easily just slap something
together exposing the fts search syntax, and if you wanted something
more precise (perhaps to emulate some other system's query language)
you could write your own parser and have a well-defined way to
generate queries for fts without having to worry about unexpected
syntax changes.

You're right that the minus-sign mod probably won't matter one way or
the other.  I was more addressing the notion that "Google does it that
way, fts should too."  Matching Google search certainly does have
advantages, since people already know how to work it, and Google seems
to have done a reasonable job of not injecting all sorts of crazy
syntax that nobody can figure out without a quick-reference card.

Hmm.  I just thought of an interesting notion.  You could have a
version of fts which takes a very precise query language, then have
another virtual table module which wraps that and converts from a
looser language to the more precise language.  So a certain extent
this is make-work, because you'd have to parse, serialize, then
re-parse, rather than just generating the query tree directly, but
search queries are generally pretty small so it might not matter much
(compared to actually executing the query).

>>It's sort of in a strange place, most people would think it a
>>poor idea (indeed, dangerous!) to put user-entered expressions in
>>their WHERE clauses.
>
> I am not sure I understand the danger. Say I sqlite3_bind() the FTS match
> query, do you see this as a serious security risk (FTS injection) or a 
> potential
> performance jeopardy, or something else?

There definitely should be no security risks, though there is some
potential for performance issues.  For instance, a user could ask for
a prefix search but your code might run in your UI thread because you
only ever tested with exact word matches which were "fast enough".

[All of this isn't merely me trying to avoid work.  I also polish off
such arguments when dealing with some of the biggest users of fts,
many of which sit close enough to me to put these questions directly.]

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS search negative term syntax

2008-05-01 Thread Ralf Junker

>Fair warning, though: It's not entirely clear that the fts search
>syntax should aim to hew too closely to consumer-oriented search
>syntax.

Interesting point, too. Up to now, I always perceived the FTS search syntax to 
be very much consumer-oriented. It it just too similar to major search engines  
to be primarily machine-oriented.

As it stands now, FTS syntax can of course be machine generated, if that is 
what you are aiming at. I believe that this should remain easy to do. And my 
suggested minus sign modification would not change this, would it?

>It's sort of in a strange place, most people would think it a
>poor idea (indeed, dangerous!) to put user-entered expressions in
>their WHERE clauses.

I am not sure I understand the danger. Say I sqlite3_bind() the FTS match 
query, do you see this as a serious security risk (FTS injection) or a 
potential performance jeopardy, or something else?

>Caveat for the above: I've spent all of five minutes thinking about
>your posting, and I was interrupted in the middle.  But I'll try to
>factor it in to future thinking.

Thanks for your time and your thoughts!

Ralf

PS: I see little traffic on <[EMAIL PROTECTED]>. Is this intentional, or should 
this and simliar topics better be discussed there? 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS search negative term syntax

2008-04-30 Thread Myk Melez
Scott Hess wrote:
> It's not entirely clear that the fts search syntax should aim to hew too 
> closely to consumer-oriented search syntax.
Indeed, I would expect the FTS search syntax to optimize for the machine 
model, while the user-facing syntax optimizes for human 
comprehensibility, and the application takes responsibility for 
translating one to the other.

-myk

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS search negative term syntax

2008-04-30 Thread Scott Hess
Interesting point.  This seems like the kind of thing that could be
implemented in the existing fts codebase without involving a version
change.  It also may be more general than just hyphenated words, for
instance $12.50 might be more usefully translated as the phrase search
"12 50" than all documents which contain 12 and 50 anywhere.  I bet
five minutes of thought would result in 25 other examples, just in
English.

Fair warning, though: It's not entirely clear that the fts search
syntax should aim to hew too closely to consumer-oriented search
syntax.  It's sort of in a strange place, most people would think it a
poor idea (indeed, dangerous!) to put user-entered expressions in
their WHERE clauses.  In other search systems you might have an API
for constructing query trees which you would pass in, but that may be
weird to express via SQL.  Not sure where this pretty muddy point
leaves us, because it's probably not relevant to this specific case
:-).

Caveat for the above: I've spent all of five minutes thinking about
your posting, and I was interrupted in the middle.  But I'll try to
factor it in to future thinking.

Thanks,
scott



On Wed, Apr 30, 2008 at 8:58 AM, Ralf Junker <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I have a small concern about the FTS negative term search syntax. Currently, 
> all terms following any minus sign ("-") are excluded from the search. This 
> is a very welcome feature, but consider searching for these hyphenated words:
>
>  Coca-Cola   -> FTS finds Coca, but never Cola
>  low-budget  -> FTS finds low, but never budget
>  twelve-year-old -> FTS finds twelve, but never year and never old
>  part-time   -> FTS finds part, but never time
>  full-time   -> FTS finds full, but never time
>
> These results do not match what most users will expect. Well, one can ask 
> them to leave out the minus sign, but users will habitually leave it in 
> because they learned from major search engines that it is the intended 
> behavior. Consider Google, which explicitly states:
>
> "Note: when you include a negative term in your search, be sure to include a 
> space before the minus sign."
>
> Source: 
> http://www.google.com/support/bin/static.py?page=searchguides.html&ctx=basics
>
> Therefore I would like to consider adding these search syntax rules:
>
> 1. A minus sign excludes a search term only when located at the beginning of 
> the search query or after a white space (space, tab, etc.):
>
>  "low-budget"  -> Find both low and budet.
>  "low -budget" -> Find low, but not budget.
>  "-low budget" -> Do not find low, but find budget.
>  "-low-budget" -> Do not find the "low budget" phrase.
>
> 2. In case the minus sign is a term separator and two or more search terms 
> are separated by sisngle minus signs only, they constitue a phrase search:
>
>  "twelve-year-old" -> "twelve year old" (phrase search)
>  "part-time"   -> "part time" (phrase search)
>
> I believe that these changes would make the FTS search syntax more intuitive 
> to use and more conformant to major search engines.
>
> Would there be a chance that they could be implemented in current FTS3 and/or 
> the upcomming FTS4?
>
> Any thoughts?
>
> Ralf
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS search negative term syntax

2008-04-30 Thread Ralf Junker
Hello,

I have a small concern about the FTS negative term search syntax. Currently, 
all terms following any minus sign ("-") are excluded from the search. This is 
a very welcome feature, but consider searching for these hyphenated words:

  Coca-Cola   -> FTS finds Coca, but never Cola
  low-budget  -> FTS finds low, but never budget
  twelve-year-old -> FTS finds twelve, but never year and never old
  part-time   -> FTS finds part, but never time
  full-time   -> FTS finds full, but never time

These results do not match what most users will expect. Well, one can ask them 
to leave out the minus sign, but users will habitually leave it in because they 
learned from major search engines that it is the intended behavior. Consider 
Google, which explicitly states:

"Note: when you include a negative term in your search, be sure to include a 
space before the minus sign."

Source: 
http://www.google.com/support/bin/static.py?page=searchguides.html&ctx=basics

Therefore I would like to consider adding these search syntax rules:

1. A minus sign excludes a search term only when located at the beginning of 
the search query or after a white space (space, tab, etc.):

  "low-budget"  -> Find both low and budet.
  "low -budget" -> Find low, but not budget.
  "-low budget" -> Do not find low, but find budget.
  "-low-budget" -> Do not find the "low budget" phrase.

2. In case the minus sign is a term separator and two or more search terms are 
separated by sisngle minus signs only, they constitue a phrase search:

  "twelve-year-old" -> "twelve year old" (phrase search)
  "part-time"   -> "part time" (phrase search)

I believe that these changes would make the FTS search syntax more intuitive to 
use and more conformant to major search engines.

Would there be a chance that they could be implemented in current FTS3 and/or 
the upcomming FTS4? 

Any thoughts?

Ralf

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users