Re: [sqlite] FTS search negative term syntax
On Thu, May 1, 2008 at 1:41 AM, Ralf Junker <[EMAIL PROTECTED]> wrote: >>Fair warning, though: It's not entirely clear that the fts search >>syntax should aim to hew too closely to consumer-oriented search >>syntax. > > Interesting point, too. Up to now, I always perceived the FTS search syntax to > be very much consumer-oriented. It it just too similar to major search engines > to be primarily machine-oriented. > > As it stands now, FTS syntax can of course be machine generated, if that is > what you are aiming at. I believe that this should remain easy to do. And my > suggested minus sign modification would not change this, would it? Right now, things are pretty hybrid. Long-term, one of the design inputs I got from interested parties when we were starting the project is that we probably want to have both a user-oriented syntax and a machine-oriented syntax. Then you could easily just slap something together exposing the fts search syntax, and if you wanted something more precise (perhaps to emulate some other system's query language) you could write your own parser and have a well-defined way to generate queries for fts without having to worry about unexpected syntax changes. You're right that the minus-sign mod probably won't matter one way or the other. I was more addressing the notion that "Google does it that way, fts should too." Matching Google search certainly does have advantages, since people already know how to work it, and Google seems to have done a reasonable job of not injecting all sorts of crazy syntax that nobody can figure out without a quick-reference card. Hmm. I just thought of an interesting notion. You could have a version of fts which takes a very precise query language, then have another virtual table module which wraps that and converts from a looser language to the more precise language. So a certain extent this is make-work, because you'd have to parse, serialize, then re-parse, rather than just generating the query tree directly, but search queries are generally pretty small so it might not matter much (compared to actually executing the query). >>It's sort of in a strange place, most people would think it a >>poor idea (indeed, dangerous!) to put user-entered expressions in >>their WHERE clauses. > > I am not sure I understand the danger. Say I sqlite3_bind() the FTS match > query, do you see this as a serious security risk (FTS injection) or a > potential > performance jeopardy, or something else? There definitely should be no security risks, though there is some potential for performance issues. For instance, a user could ask for a prefix search but your code might run in your UI thread because you only ever tested with exact word matches which were "fast enough". [All of this isn't merely me trying to avoid work. I also polish off such arguments when dealing with some of the biggest users of fts, many of which sit close enough to me to put these questions directly.] -scott ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS search negative term syntax
>Fair warning, though: It's not entirely clear that the fts search >syntax should aim to hew too closely to consumer-oriented search >syntax. Interesting point, too. Up to now, I always perceived the FTS search syntax to be very much consumer-oriented. It it just too similar to major search engines to be primarily machine-oriented. As it stands now, FTS syntax can of course be machine generated, if that is what you are aiming at. I believe that this should remain easy to do. And my suggested minus sign modification would not change this, would it? >It's sort of in a strange place, most people would think it a >poor idea (indeed, dangerous!) to put user-entered expressions in >their WHERE clauses. I am not sure I understand the danger. Say I sqlite3_bind() the FTS match query, do you see this as a serious security risk (FTS injection) or a potential performance jeopardy, or something else? >Caveat for the above: I've spent all of five minutes thinking about >your posting, and I was interrupted in the middle. But I'll try to >factor it in to future thinking. Thanks for your time and your thoughts! Ralf PS: I see little traffic on <[EMAIL PROTECTED]>. Is this intentional, or should this and simliar topics better be discussed there? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS search negative term syntax
Scott Hess wrote: > It's not entirely clear that the fts search syntax should aim to hew too > closely to consumer-oriented search syntax. Indeed, I would expect the FTS search syntax to optimize for the machine model, while the user-facing syntax optimizes for human comprehensibility, and the application takes responsibility for translating one to the other. -myk ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS search negative term syntax
Interesting point. This seems like the kind of thing that could be implemented in the existing fts codebase without involving a version change. It also may be more general than just hyphenated words, for instance $12.50 might be more usefully translated as the phrase search "12 50" than all documents which contain 12 and 50 anywhere. I bet five minutes of thought would result in 25 other examples, just in English. Fair warning, though: It's not entirely clear that the fts search syntax should aim to hew too closely to consumer-oriented search syntax. It's sort of in a strange place, most people would think it a poor idea (indeed, dangerous!) to put user-entered expressions in their WHERE clauses. In other search systems you might have an API for constructing query trees which you would pass in, but that may be weird to express via SQL. Not sure where this pretty muddy point leaves us, because it's probably not relevant to this specific case :-). Caveat for the above: I've spent all of five minutes thinking about your posting, and I was interrupted in the middle. But I'll try to factor it in to future thinking. Thanks, scott On Wed, Apr 30, 2008 at 8:58 AM, Ralf Junker <[EMAIL PROTECTED]> wrote: > Hello, > > I have a small concern about the FTS negative term search syntax. Currently, > all terms following any minus sign ("-") are excluded from the search. This > is a very welcome feature, but consider searching for these hyphenated words: > > Coca-Cola -> FTS finds Coca, but never Cola > low-budget -> FTS finds low, but never budget > twelve-year-old -> FTS finds twelve, but never year and never old > part-time -> FTS finds part, but never time > full-time -> FTS finds full, but never time > > These results do not match what most users will expect. Well, one can ask > them to leave out the minus sign, but users will habitually leave it in > because they learned from major search engines that it is the intended > behavior. Consider Google, which explicitly states: > > "Note: when you include a negative term in your search, be sure to include a > space before the minus sign." > > Source: > http://www.google.com/support/bin/static.py?page=searchguides.html&ctx=basics > > Therefore I would like to consider adding these search syntax rules: > > 1. A minus sign excludes a search term only when located at the beginning of > the search query or after a white space (space, tab, etc.): > > "low-budget" -> Find both low and budet. > "low -budget" -> Find low, but not budget. > "-low budget" -> Do not find low, but find budget. > "-low-budget" -> Do not find the "low budget" phrase. > > 2. In case the minus sign is a term separator and two or more search terms > are separated by sisngle minus signs only, they constitue a phrase search: > > "twelve-year-old" -> "twelve year old" (phrase search) > "part-time" -> "part time" (phrase search) > > I believe that these changes would make the FTS search syntax more intuitive > to use and more conformant to major search engines. > > Would there be a chance that they could be implemented in current FTS3 and/or > the upcomming FTS4? > > Any thoughts? > > Ralf > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS search negative term syntax
Hello, I have a small concern about the FTS negative term search syntax. Currently, all terms following any minus sign ("-") are excluded from the search. This is a very welcome feature, but consider searching for these hyphenated words: Coca-Cola -> FTS finds Coca, but never Cola low-budget -> FTS finds low, but never budget twelve-year-old -> FTS finds twelve, but never year and never old part-time -> FTS finds part, but never time full-time -> FTS finds full, but never time These results do not match what most users will expect. Well, one can ask them to leave out the minus sign, but users will habitually leave it in because they learned from major search engines that it is the intended behavior. Consider Google, which explicitly states: "Note: when you include a negative term in your search, be sure to include a space before the minus sign." Source: http://www.google.com/support/bin/static.py?page=searchguides.html&ctx=basics Therefore I would like to consider adding these search syntax rules: 1. A minus sign excludes a search term only when located at the beginning of the search query or after a white space (space, tab, etc.): "low-budget" -> Find both low and budet. "low -budget" -> Find low, but not budget. "-low budget" -> Do not find low, but find budget. "-low-budget" -> Do not find the "low budget" phrase. 2. In case the minus sign is a term separator and two or more search terms are separated by sisngle minus signs only, they constitue a phrase search: "twelve-year-old" -> "twelve year old" (phrase search) "part-time" -> "part time" (phrase search) I believe that these changes would make the FTS search syntax more intuitive to use and more conformant to major search engines. Would there be a chance that they could be implemented in current FTS3 and/or the upcomming FTS4? Any thoughts? Ralf ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users