Re: [SMW-devel] [PATCH] Support LIKE in queries
* Markus Krötzsch [EMAIL PROTECTED] [2008-01-02 08:37]: On Sonntag, 30. Dezember 2007, Thomas Bleher wrote: * Markus Krötzsch [EMAIL PROTECTED] [2007-12-30 22:10]: OK, my conclusion now was to support the following syntax: [[property% *subs?r*]] where ? and * represent _ and % in SQL. I think this is fine generally, but now you cannot query for a literal * or ? anymore, AFAIK. I would not consider this to be a major issue, given that those characters are not too common in typical application strings, and given the fact that using ? still queries for some symbol in that place -- it seems to be very unlikely that too strings differ only in one position where the query string has a ?. So in most cases it will have the same hits anyway (yes, there are some cases that could be problematic [1] ;). Agreed. Anyway, I will leave this issue at rest until any user actually complains about this limitation. Here I have to respectfully disagree. It seems unwise to wait until someone complains, when there is already a patch resolving the issue. Why spend more time later on when the issue can just be fixed right now? OK, the regexes where not very readable, but it doesn't really make the code more complicated. FWIW, the regexes where so ugly only because backslashes have to be escaped twice for PHPs preg_replace (so a single \ becomes ). If we used ! as an escape sequence instead of \, the regexes would look like this (untested): $value = str_replace(array('%', '_'), array('!%', '!_'), $value); $value = preg_replace('/(?!!)((?:!!)*)\*/', '$1%', $value); // if there's an even number of \, change * to % $value = preg_replace('/(?!!)((?:!!)*)\?/', '$1_', $value); // ditto for ? and _ $value = preg_replace('/(?!!)((?:!!)*)!\*/', '$1*', $value); // if there's an odd number, * was escaped and should stay as is; but the last \ is removed $value = preg_replace('/(?!!)((?:!!)*)!\?/', '$1?', $value); // ditto for ? (?: ) is a subexpression for grouping, not capturing, (?! ) is zero-width negative look-behind (i.e. we make sure that the character before our match is not !). Regards, Thomas [1] http://de.wikipedia.org/wiki/Die_drei_%3F%3F%3F Not a huge deal, but before, a_b searched for a, followed by any char, followed by b, while a\_b searched for exactly a_b. Properly escaping everything gets messy rather quickly, as \ can also be escaped to query for a literal \, so you need translations like: ?= _ \? = ? \\? = \\_ \\\? = \\? The following regular expressions work fine for me, but unfortunately they are quite ugly: $value = str_replace(array('%', '_'), array('\%', '\_'), $value); // escape % and _ $value = preg_replace('/(?!)((?:)*)\*/', '$1%', $value); // if there's an even number of \, change * to % $value = preg_replace('/(?!)((?:)*)\?/', '$1_', $value); // ditto for ? and _ $value = preg_replace('/(?!)((?:)*)\*/', '$1*', $value); // if there's an odd number, * was escaped and should stay as is; but the last \ is removed $value = preg_replace('/(?!)((?:)*)\?/', '$1?', $value); // ditto for ? I think these should be added to SMW, so all characters can be queried. Regards, Thomas -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org signature.asc Description: Digital signature - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] [PATCH] Support LIKE in queries
On Sonntag, 30. Dezember 2007, Thomas Bleher wrote: * Markus Krötzsch [EMAIL PROTECTED] [2007-12-30 22:10]: OK, my conclusion now was to support the following syntax: [[property% *subs?r*]] where ? and * represent _ and % in SQL. I think this is fine generally, but now you cannot query for a literal * or ? anymore, AFAIK. I would not consider this to be a major issue, given that those characters are not too common in typical application strings, and given the fact that using ? still queries for some symbol in that place -- it seems to be very unlikely that too strings differ only in one position where the query string has a ?. So in most cases it will have the same hits anyway (yes, there are some cases that could be problematic [1] ;). Anyway, I will leave this issue at rest until any user actually complains about this limitation. Regards, Markus [1] http://de.wikipedia.org/wiki/Die_drei_%3F%3F%3F Not a huge deal, but before, a_b searched for a, followed by any char, followed by b, while a\_b searched for exactly a_b. Properly escaping everything gets messy rather quickly, as \ can also be escaped to query for a literal \, so you need translations like: ?= _ \? = ? \\? = \\_ \\\? = \\? The following regular expressions work fine for me, but unfortunately they are quite ugly: $value = str_replace(array('%', '_'), array('\%', '\_'), $value); // escape % and _ $value = preg_replace('/(?!)((?:)*)\*/', '$1%', $value); // if there's an even number of \, change * to % $value = preg_replace('/(?!)((?:)*)\?/', '$1_', $value); // ditto for ? and _ $value = preg_replace('/(?!)((?:)*)\*/', '$1*', $value); // if there's an odd number, * was escaped and should stay as is; but the last \ is removed $value = preg_replace('/(?!)((?:)*)\?/', '$1?', $value); // ditto for ? I think these should be added to SMW, so all characters can be queried. Regards, Thomas -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org signature.asc Description: This is a digitally signed message part. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] [PATCH] Support LIKE in queries
On Sonntag, 30. Dezember 2007, Yaron Koren wrote: Dan - I doubt that there will ever be both a regex and a wildcard option in SMW's query language - that seems like overkill, and somewhat bad design. A single such option is enough, and if it happens, behind the scenes, to use both SQL's and PHP's pattern-matching capabilities at different times, that should be hidden from the user. So I doubt that there'll be a need for two different symbols (Markus, or anyone else, correct me if I'm wrong). So, let me argue in favor of the ~ symbol - hopefully it's not too late before the Sunday evening deadline. :) There was a drastic change in the parser of MediaWiki 1.12 that has caused some delay. So deadline is moved to today ;-) The Halo extension is a helpful one, but it's a spinoff of SMW, and thus there's no reason why it should hamper design decisions in SMW. That goes for all extensions that use Semantic MediaWiki - I know, for my own part, that the extensions I've created have to do all sorts of work to be compatible with the different versions of SMW. That's as it should be - the spinoffs work around the main application. From what I understand, Halo is currently not compatible with the most recent versions of SMW anyway, so it needs to be modified anyway - there's no need to try to ensure backwards compatibility. And, as you point out, that functionality in Halo might not be getting used at all - though even if it were, that shouldn't affect how SMW is designed. OK, I am convinced. Done. Markus -Yaron On Dec 29, 2007 9:54 PM, DanTMan [EMAIL PROTECTED] wrote: ^_^ ok, I thought we escaped with a \, which isn't something that normal users would find easy to use. But a starting space escape is ok. I still would pick ~ as the best thing for use of REGEX and prefer a different operator for wild cards I guess the % is probably best for the wild card operator. Which brings me the thought of: EQ:[[property::value]] NEQ: [[property::!value]] GT:[[property::value]] LT:[[property::value]] WILD: [[property::%value]] (Using ? and *) Also, I propose a few more additions since they will probably have some good use to. GTEQ: [[property::=value]] LTEQ: [[property::=value]] NWILD: [[property::!%value]] (Negated wild card) REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could of course be replaced with !, [], etc... any valid in preg. NGT: [[property::#value]] (Natural order greater than) NLT: [[property::#value]] (Natural order less than) NGTEQ: [[property::#=value]] (Natural order greater than or equal to) NLTEQ: [[property::#=value]] (Natural order less than or equal to) Of course, the REGEX one is provided that we can fix the issue of colliding with Halo. But on note of that negated wild card. I added that one for one primary reason. Unlike any of the other things, you cannot negate a wild card with any other format. ( can be negated with =, eq with !, and regex can negate things inside of it. But you can't negate a wild card) Also, remember to escape things so that we can use (\* and \? to use those literally; I could draft all the replaces needed, but I got to go do something first) As for the Natural order ones, if you don't know what those are for, it's things like values of 1.2.3 and 1.12.3. Using a normal it thinks that 1.2.3 is greater than 1.12.3 because the third character is a two and the third character in the other is a 1. But a natural order properly distinguishes the second number as 12. PHP has functions for these built in and would be nice for use. ~Daniel Friesen(Dantman) of: -The Gaiapedia ( http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com ( http://wiki-tools.com) Markus Krötzsch wrote: On Samstag, 29. Dezember 2007, DanTMan wrote: A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) Yes, I agree to that. I think, if nobody objects, this fixes the pattern syntax. So it remains to find a good symbol for the comparator. I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals Greater than = Greater than or equal to Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex As a note: = is not available in parser function #ask, since it has a special meaning as parameter assignment,
Re: [SMW-devel] [PATCH] Support LIKE in queries
Dan - I doubt that there will ever be both a regex and a wildcard option in SMW's query language - that seems like overkill, and somewhat bad design. A single such option is enough, and if it happens, behind the scenes, to use both SQL's and PHP's pattern-matching capabilities at different times, that should be hidden from the user. So I doubt that there'll be a need for two different symbols (Markus, or anyone else, correct me if I'm wrong). So, let me argue in favor of the ~ symbol - hopefully it's not too late before the Sunday evening deadline. :) The Halo extension is a helpful one, but it's a spinoff of SMW, and thus there's no reason why it should hamper design decisions in SMW. That goes for all extensions that use Semantic MediaWiki - I know, for my own part, that the extensions I've created have to do all sorts of work to be compatible with the different versions of SMW. That's as it should be - the spinoffs work around the main application. From what I understand, Halo is currently not compatible with the most recent versions of SMW anyway, so it needs to be modified anyway - there's no need to try to ensure backwards compatibility. And, as you point out, that functionality in Halo might not be getting used at all - though even if it were, that shouldn't affect how SMW is designed. -Yaron On Dec 29, 2007 9:54 PM, DanTMan [EMAIL PROTECTED] wrote: ^_^ ok, I thought we escaped with a \, which isn't something that normal users would find easy to use. But a starting space escape is ok. I still would pick ~ as the best thing for use of REGEX and prefer a different operator for wild cards I guess the % is probably best for the wild card operator. Which brings me the thought of: EQ:[[property::value]] NEQ: [[property::!value]] GT:[[property::value]] LT:[[property::value]] WILD: [[property::%value]] (Using ? and *) Also, I propose a few more additions since they will probably have some good use to. GTEQ: [[property::=value]] LTEQ: [[property::=value]] NWILD: [[property::!%value]] (Negated wild card) REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could of course be replaced with !, [], etc... any valid in preg. NGT: [[property::#value]] (Natural order greater than) NLT: [[property::#value]] (Natural order less than) NGTEQ: [[property::#=value]] (Natural order greater than or equal to) NLTEQ: [[property::#=value]] (Natural order less than or equal to) Of course, the REGEX one is provided that we can fix the issue of colliding with Halo. But on note of that negated wild card. I added that one for one primary reason. Unlike any of the other things, you cannot negate a wild card with any other format. ( can be negated with =, eq with !, and regex can negate things inside of it. But you can't negate a wild card) Also, remember to escape things so that we can use (\* and \? to use those literally; I could draft all the replaces needed, but I got to go do something first) As for the Natural order ones, if you don't know what those are for, it's things like values of 1.2.3 and 1.12.3. Using a normal it thinks that 1.2.3 is greater than 1.12.3 because the third character is a two and the third character in the other is a 1. But a natural order properly distinguishes the second number as 12. PHP has functions for these built in and would be nice for use. ~Daniel Friesen(Dantman) of: -The Gaiapedia ( http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com ( http://wiki-tools.com) Markus Krötzsch wrote: On Samstag, 29. Dezember 2007, DanTMan wrote: A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) Yes, I agree to that. I think, if nobody objects, this fixes the pattern syntax. So it remains to find a good symbol for the comparator. I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals Greater than = Greater than or equal to Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex As a note: = is not available in parser function #ask, since it has a special meaning as parameter assignment, as e.g. in format=table. The query is distinguished from the other parameters and print requests in #ask since it has no = symbol and does not start on ?. But I did have a thought about the @... It's not used anywhere afaik. I did make a suggestion on using a pattern to separate the
Re: [SMW-devel] [PATCH] Support LIKE in queries
On Samstag, 29. Dezember 2007, DanTMan wrote: A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) Yes, I agree to that. I think, if nobody objects, this fixes the pattern syntax. So it remains to find a good symbol for the comparator. I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals Greater than = Greater than or equal to Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex As a note: = is not available in parser function #ask, since it has a special meaning as parameter assignment, as e.g. in format=table. The query is distinguished from the other parameters and print requests in #ask since it has no = symbol and does not start on ?. But I did have a thought about the @... It's not used anywhere afaik. I did make a suggestion on using a pattern to separate the comparators from the match value. It was using [[Property::comparitor::match]], but as I now remember SMW lets you use :: to specify multiple properties. However it may be a good idea if the separator was one which wouldn't cause conflicting issues with other things. Maybe I should remark that the comparator we chose will never block any symbol from being used in values. You can always escape the initial comparator by inserting an initial space (which is ignored in all values). For instance, to look for pages with property value strange value, one could write [[some property:: strange value]] whereas [[some property::strange value]] would be equivalent to [[some property:: strange value]] which matches all values (alphabetically) smaller than strange value. So we can pick any comparator letter without conflicts. @ is not commonly used and does provide a little bit of a way for people to understand it's use. Or if you want a little farther from what can actually be used in a title (To avoid clashing with things) the # is always invalid. Say, [[prop::[EMAIL PROTECTED] or [[prop::comp#match]]. So for a not [[Has value::[EMAIL PROTECTED] or [[Has value::!#Value]]. Basically, spaces already play the role of your proposed @ or #. I'm probably droning on now... But what about finding a good separator and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought of),LT[], GT[], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... Not sure whether that would be better internationally. seems to be more universally understood than LT. Another remark: ::! stands for inequality (NEQ), not for negation (NOT). It looks for pages that have some property value unequal to the one that was given, and it does not matter whether or not they also have some value that is equal. So a page that is annotated with [[property::1]] and [[property::2]] would match a query atom [[property::!1]]. There also is the possibility of instead of a separator, using brackets to encompass a comparator. I can hardly think of many places which would use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we also have the {} and [] type brackets. [] is used by external links, but {} is only used in multiples as a template or variable bit but never has use singularly, templates and values will have already been parsed out so only the singles remain, and as a bonus, { and } are illegal in titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with a legal title or match you can make. If you're worried about templates and parsing issues, those can't occur when your using something like {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash. The only potential class is if someone wants to use {{{comparator|EQ}}} to specify the comparator. In that case, we could easily make { EQ } valid (trim spaces), so { {{{comparator|EQ}}} } would work. Yes, that would work too. But I am happy with our spaces (the fact that initial and trailing spaces are ignored in all property values is the key to make that work, and I think there is no harm in assuming that). There is, in principle, no problem with having multi-char sequences for comparators, but I would prefer something that does not require internationalisation. So, given that we use * and ? instead of % and _, there are the following options: 1- [[property::%*substring*]] 2- [[property::#*substring*]] 3- [[property::~*substring*]] (clashes with Halo) 4- [[property::@*substring*]] 5- maybe more ... My order of preference would be 3, 1, 4, 2, and I opt for 1 due to the Halo issue. Further
Re: [SMW-devel] [PATCH] Support LIKE in queries
^_^ ok, I thought we escaped with a \, which isn't something that normal users would find easy to use. But a starting space escape is ok. I still would pick ~ as the best thing for use of REGEX and prefer a different operator for wild cards I guess the % is probably best for the wild card operator. Which brings me the thought of: EQ:[[property::value]] NEQ: [[property::!value]] GT:[[property::value]] LT:[[property::value]] WILD: [[property::%value]] (Using ? and *) Also, I propose a few more additions since they will probably have some good use to. GTEQ: [[property::=value]] LTEQ: [[property::=value]] NWILD: [[property::!%value]] (Negated wild card) REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could of course be replaced with !, [], etc... any valid in preg. NGT: [[property::#value]] (Natural order greater than) NLT: [[property::#value]] (Natural order less than) NGTEQ: [[property::#=value]] (Natural order greater than or equal to) NLTEQ: [[property::#=value]] (Natural order less than or equal to) Of course, the REGEX one is provided that we can fix the issue of colliding with Halo. But on note of that negated wild card. I added that one for one primary reason. Unlike any of the other things, you cannot negate a wild card with any other format. ( can be negated with =, eq with !, and regex can negate things inside of it. But you can't negate a wild card) Also, remember to escape things so that we can use (\* and \? to use those literally; I could draft all the replaces needed, but I got to go do something first) As for the Natural order ones, if you don't know what those are for, it's things like values of 1.2.3 and 1.12.3. Using a normal it thinks that 1.2.3 is greater than 1.12.3 because the third character is a two and the third character in the other is a 1. But a natural order properly distinguishes the second number as 12. PHP has functions for these built in and would be nice for use. ~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com) Markus Krötzsch wrote: On Samstag, 29. Dezember 2007, DanTMan wrote: A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) Yes, I agree to that. I think, if nobody objects, this fixes the pattern syntax. So it remains to find a good symbol for the comparator. I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals Greater than = Greater than or equal to Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex As a note: = is not available in parser function #ask, since it has a special meaning as parameter assignment, as e.g. in format=table. The query is distinguished from the other parameters and print requests in #ask since it has no = symbol and does not start on ?. But I did have a thought about the @... It's not used anywhere afaik. I did make a suggestion on using a pattern to separate the comparators from the match value. It was using [[Property::comparitor::match]], but as I now remember SMW lets you use :: to specify multiple properties. However it may be a good idea if the separator was one which wouldn't cause conflicting issues with other things. Maybe I should remark that the comparator we chose will never block any symbol from being used in values. You can always escape the initial comparator by inserting an initial space (which is ignored in all values). For instance, to look for pages with property value strange value, one could write [[some property:: strange value]] whereas [[some property::strange value]] would be equivalent to [[some property:: strange value]] which matches all values (alphabetically) smaller than strange value. So we can pick any comparator letter without conflicts. @ is not commonly used and does provide a little bit of a way for people to understand it's use. Or if you want a little farther from what can actually be used in a title (To avoid clashing with things) the # is always invalid. Say, [[prop::[EMAIL PROTECTED] or [[prop::comp#match]]. So for a not [[Has value::[EMAIL PROTECTED] or [[Has value::!#Value]]. Basically, spaces already play the role of your proposed @ or #. I'm probably droning on now... But what about finding a good separator and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought
Re: [SMW-devel] [PATCH] Support LIKE in queries
On Freitag, 28. Dezember 2007, Yaron Koren wrote: How about ~%substring% instead? The ~ is the symbol for pattern matching in Perl and some UNIX languages, and it might be a clearer indicator of function than %. I would immediately use that, but IFRC the Halo extension has a similar syntax for a custom editing-distance database function (requires modified MySQL version, and probably also has significant performance issues). So the question is whether we want to overwrite that (assuming that this particular Halo function is not used widely), or is there another idea for doing it? Other imaginable operators on my keyboard would be #, , ?, @ -- none really as nice as ~ ... Markus On Dec 27, 2007 2:16 PM, Markus Krötzsch [EMAIL PROTECTED] wrote: Thanks. I have applied the patch, and added a way of configuring this feature: the parameter $smwgQComparators gives a (|-separated) list of supported comparators, and can be used to enable or disable any of , , !, and %. By default its value is '||!|%'. In this way one can also disable ! or even , if these are considered to be problematic. I wonder whether one should use another character instead of % as a wildcard inside the pattern string, so that no double-% confusion can arise. Would * be an alternative or would it be too confusing w.r.t. the old ask print requests? What about +? According examples (preprocessing would in each case ensure full compatibility with SQL): - %%substring% - %*substring* - %+substring+ Cheers Markus On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: On Thu, 20 Dec 2007, Thomas Bleher wrote: Yesterday I needed LIKE queries for properties, so I added it to SMW (patch attached). It was surprisingly simple. This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. It would be great if later SMW could have Valgol support http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html. -- Asheesh. P.S. In all total like seriousness, queries with LIKE support are a good idea -- The star of riches is shining upon you. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org signature.asc Description: This is a digitally signed message part. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] [PATCH] Support LIKE in queries
A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals Greater than = Greater than or equal to Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex But I did have a thought about the @... It's not used anywhere afaik. I did make a suggestion on using a pattern to separate the comparators from the match value. It was using [[Property::comparitor::match]], but as I now remember SMW lets you use :: to specify multiple properties. However it may be a good idea if the separator was one which wouldn't cause conflicting issues with other things. @ is not commonly used and does provide a little bit of a way for people to understand it's use. Or if you want a little farther from what can actually be used in a title (To avoid clashing with things) the # is always invalid. Say, [[prop::[EMAIL PROTECTED] or [[prop::comp#match]]. So for a not [[Has value::[EMAIL PROTECTED] or [[Has value::!#Value]]. I'm probably droning on now... But what about finding a good separator and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought of),LT[], GT[], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... There also is the possibility of instead of a separator, using brackets to encompass a comparator. I can hardly think of many places which would use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we also have the {} and [] type brackets. [] is used by external links, but {} is only used in multiples as a template or variable bit but never has use singularly, templates and values will have already been parsed out so only the singles remain, and as a bonus, { and } are illegal in titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with a legal title or match you can make. If you're worried about templates and parsing issues, those can't occur when your using something like {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash. The only potential class is if someone wants to use {{{comparator|EQ}}} to specify the comparator. In that case, we could easily make { EQ } valid (trim spaces), so { {{{comparator|EQ}}} } would work. But... now I'm droning a bit much... ~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com) Markus Krötzsch wrote: On Freitag, 28. Dezember 2007, Yaron Koren wrote: How about ~%substring% instead? The ~ is the symbol for pattern matching in Perl and some UNIX languages, and it might be a clearer indicator of function than %. I would immediately use that, but IFRC the Halo extension has a similar syntax for a custom editing-distance database function (requires modified MySQL version, and probably also has significant performance issues). So the question is whether we want to overwrite that (assuming that this particular Halo function is not used widely), or is there another idea for doing it? Other imaginable operators on my keyboard would be #, , ?, @ -- none really as nice as ~ ... Markus On Dec 27, 2007 2:16 PM, Markus Krötzsch [EMAIL PROTECTED] wrote: Thanks. I have applied the patch, and added a way of configuring this feature: the parameter $smwgQComparators gives a (|-separated) list of supported comparators, and can be used to enable or disable any of , , !, and %. By default its value is '||!|%'. In this way one can also disable ! or even , if these are considered to be problematic. I wonder whether one should use another character instead of % as a wildcard inside the pattern string, so that no double-% confusion can arise. Would * be an alternative or would it be too confusing w.r.t. the old ask print requests? What about +? According examples (preprocessing would in each case ensure full compatibility with SQL): - %%substring% - %*substring* - %+substring+ Cheers Markus On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: On Thu, 20 Dec 2007, Thomas Bleher wrote: Yesterday I needed LIKE queries for properties, so I added it to SMW (patch attached). It was surprisingly simple. This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. It would be great if later SMW could have Valgol support http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html. -- Asheesh. P.S. In all total like seriousness, queries with LIKE support are a good
Re: [SMW-devel] [PATCH] Support LIKE in queries
Thanks. I have applied the patch, and added a way of configuring this feature: the parameter $smwgQComparators gives a (|-separated) list of supported comparators, and can be used to enable or disable any of , , !, and %. By default its value is '||!|%'. In this way one can also disable ! or even , if these are considered to be problematic. I wonder whether one should use another character instead of % as a wildcard inside the pattern string, so that no double-% confusion can arise. Would * be an alternative or would it be too confusing w.r.t. the old ask print requests? What about +? According examples (preprocessing would in each case ensure full compatibility with SQL): - %%substring% - %*substring* - %+substring+ Cheers Markus On Donnerstag, 20. Dezember 2007, Asheesh Laroia wrote: On Thu, 20 Dec 2007, Thomas Bleher wrote: Yesterday I needed LIKE queries for properties, so I added it to SMW (patch attached). It was surprisingly simple. This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. It would be great if later SMW could have Valgol support http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html. -- Asheesh. P.S. In all total like seriousness, queries with LIKE support are a good idea -- The star of riches is shining upon you. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org signature.asc Description: This is a digitally signed message part. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] [PATCH] Support LIKE in queries
On Thu, 20 Dec 2007, Thomas Bleher wrote: Yesterday I needed LIKE queries for properties, so I added it to SMW (patch attached). It was surprisingly simple. This would be LIKE TOTALLY AWESOME to get in to Semantic MediaWiki. It would be great if later SMW could have Valgol support http://www.indwes.edu/Faculty/bcupp/things/computer/VALGOL.html. -- Asheesh. P.S. In all total like seriousness, queries with LIKE support are a good idea -- The star of riches is shining upon you. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel