Re: [SMW-devel] {{#ask}}
What I meant was: a simple cron-job can touch LocalSettings.php regularly to purge the MW cache globally. Not much interaction with MW needed for that. Yes, that's simple. I guess a strong solution for that will still take some time. One could of course store inline queries in some table, use IDs for each, and permit anyone to use ask with such an (internal) ID only, whereas making custom queries would require further permissions. But this is some more code, and I am not entirely convinced of that design. Did you experience problems with anonymous users that access Special:Ask? On ontoworld it seems that a significant amount of Special:Ask requests really come from further results links. Still not. It might show up later, when SMW will become common. But maybe you're right that it doesn't worth the efforts. Dmitriy - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] [PATCH] Support LIKE in queries
On Samstag, 29. Dezember 2007, DanTMan wrote: A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) Yes, I agree to that. I think, if nobody objects, this fixes the pattern syntax. So it remains to find a good symbol for the comparator. I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals Greater than = Greater than or equal to Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex As a note: = is not available in parser function #ask, since it has a special meaning as parameter assignment, as e.g. in format=table. The query is distinguished from the other parameters and print requests in #ask since it has no = symbol and does not start on ?. But I did have a thought about the @... It's not used anywhere afaik. I did make a suggestion on using a pattern to separate the comparators from the match value. It was using [[Property::comparitor::match]], but as I now remember SMW lets you use :: to specify multiple properties. However it may be a good idea if the separator was one which wouldn't cause conflicting issues with other things. Maybe I should remark that the comparator we chose will never block any symbol from being used in values. You can always escape the initial comparator by inserting an initial space (which is ignored in all values). For instance, to look for pages with property value strange value, one could write [[some property:: strange value]] whereas [[some property::strange value]] would be equivalent to [[some property:: strange value]] which matches all values (alphabetically) smaller than strange value. So we can pick any comparator letter without conflicts. @ is not commonly used and does provide a little bit of a way for people to understand it's use. Or if you want a little farther from what can actually be used in a title (To avoid clashing with things) the # is always invalid. Say, [[prop::[EMAIL PROTECTED] or [[prop::comp#match]]. So for a not [[Has value::[EMAIL PROTECTED] or [[Has value::!#Value]]. Basically, spaces already play the role of your proposed @ or #. I'm probably droning on now... But what about finding a good separator and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought of),LT[], GT[], REGEX(P)[~], LIKE[%_], wildcard[*?], etc... Not sure whether that would be better internationally. seems to be more universally understood than LT. Another remark: ::! stands for inequality (NEQ), not for negation (NOT). It looks for pages that have some property value unequal to the one that was given, and it does not matter whether or not they also have some value that is equal. So a page that is annotated with [[property::1]] and [[property::2]] would match a query atom [[property::!1]]. There also is the possibility of instead of a separator, using brackets to encompass a comparator. I can hardly think of many places which would use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we also have the {} and [] type brackets. [] is used by external links, but {} is only used in multiples as a template or variable bit but never has use singularly, templates and values will have already been parsed out so only the singles remain, and as a bonus, { and } are illegal in titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with a legal title or match you can make. If you're worried about templates and parsing issues, those can't occur when your using something like {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash. The only potential class is if someone wants to use {{{comparator|EQ}}} to specify the comparator. In that case, we could easily make { EQ } valid (trim spaces), so { {{{comparator|EQ}}} } would work. Yes, that would work too. But I am happy with our spaces (the fact that initial and trailing spaces are ignored in all property values is the key to make that work, and I think there is no harm in assuming that). There is, in principle, no problem with having multi-char sequences for comparators, but I would prefer something that does not require internationalisation. So, given that we use * and ? instead of % and _, there are the following options: 1- [[property::%*substring*]] 2- [[property::#*substring*]] 3- [[property::~*substring*]] (clashes with Halo) 4- [[property::@*substring*]] 5- maybe more ... My order of preference would be 3, 1, 4, 2, and I opt for 1 due to the Halo issue. Further
Re: [SMW-devel] SMW Performance
On Freitag, 28. Dezember 2007, Lau, William (NIH/CIT) [E] wrote: We have a set of semantic queries in a template. That template is used in some pages. However, by looking at the database process list, it seems that those set of queries are processed whenever a page is requested, even when the template is not used by the requested page (e.g. special pages). Do I understand this correctly? The delivery of special pages that are completely unrelated to said template triggers the ask-queries contained therein? This would be very strange behaviour indeed (I cannot currently imagine how or why this should happen in MediaWiki)! All the SQL queries are generated by the getQueryResult function. Since those queries are very computational intensive, this bug slows down the entire site. If we take the inline queries out of the template or change $smwgQEnabled to false, the site becomes fast again. Has anyone experienced the same issue? In general, if queries on some site are too slow, it is useful to configure SMW to support faster querying (with less features, of course). Basic settings one can try to speed up querying are: include_once('extensions/SemanticMediaWiki/includes/SMW_Settings.php'); $smwgQSubcategoryDepth = 0; $smwgQSubpropertyDepth = 0; $smwgQEqualitySupport = SMW_EQ_NONE; $smwgQDefaultNamespaces = NULL; enableSemantics(semedia-wiki.localhost); Those settings will speed up basically all queries, disabling all support for property and category hierarchies, equality (redirects), and namespace restrictions (i.e. queries consider pages in all namespaces, including, e.g., User:). You can experiment which of those, if any, affects your query performance positively. If you have problems with too complex user-generated queries, then the parameters $smwgQMaxSize and $smwgQMaxDepth are an option to restrict this. In general, it should be emphasised that queries should be used in a targetted way. Ontoworld.org had the infamous template {{ask}} for some time, which included queries for almost anything, which would just not appear if no results would be obtained. Most wikis should rather have single query templates for special purposes instead of trying to have one for all. Anyway, for further optimisation, we need some pointer to your site, or at least some statistical information concerning its size (Special:SemanticStatistics) and the query structure. Did you mention the SMW version you use? Some of the above assume SMW1.0-RC3, and none will work prior to SMW1.0-RC1. Markus -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org signature.asc Description: This is a digitally signed message part. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] Performance: (Was: {{#ask}})
On Montag, 17. Dezember 2007, Sergey Chernyshev wrote: Thank you, Markus - it's a really good review! I wonder if there is any way to unify performance reporting for all SMW instances so we can compare the effects of large data sets, different systems configs (e.g. disabled cache and so on) - just looked at profileinfo.php script, it might be an answer, actually. I wonder if real Wikipedia set of data (outdated, maybe) is going to be set up as a test-case for SMW to handle (with Semantic Templates, of course) - I was going to do that, but don't have resources for this. This might help to make the goal of Semantic Wikipedia more transparent. In fact we have such a site, but it runs on a rather unstable hardware (we have a buggy RAID controller or driver :-(). It is our test server at test.ontoworld.org, which also was used for other experiments and is not in perfect shape right now (and querying was disabled in order to not impair other experiments). We might set up another more recent Wikipedia copy sometime in some future. Since we're talking about performance, there is another side of performance tuning - percepted performance, this mostly concerns javascripts, css and so on - for example there is still a problem of SIMILE Timeline not being that fast to load (although performance of pages that didn't have it improved now, when client-side the code is loaded only on pages that need it). This kind of issues can be tracked using Firebog with Yahoo's YSlow add-on. True, and I hope Timeline is really the main performance problem there. I wonder whether we could ship a more stripped down version of the scripts to decrease load time. I guess we should ask the guys over at SAIL for that ... I'll be happy to run the tests on the system with significant amount of data if you need a testbed. All profiling support is appreciated, but I am not sure how to operationalise testing on our servers (SQL profiling would probably need server access, which is not possible in this case). Insights on JavaScript performance are also useful, but I guess that MySQL tuning could be most important for approaching large sites. I you have know about DB optimisation, you can also have a look at our DB layout and at the SQL queries we generate (format=debug). Thanks, Markus On Dec 16, 2007 8:56 AM, Markus Krötzsch [EMAIL PROTECTED] wrote: On Freitag, 14. Dezember 2007, Sergey Chernyshev wrote: Got it - if it'll speed up the process, that'll be great. Currently SMW on top of MW runs significantly slower then just MW which is not very good because it means that SMW+MW can't scale as good as MW alone. Can you describe in a couple of paragraphs how SMW data and queries are getting cached and how that cache is being invalidated, what works on the fly and what is served from parser cache. I understand it's a lot to describe, but projects with massive amount of data and traffic, performance can be a big show-stopper - we picked MW for one of our projects because of Wikipedia performance example and predictability and I hope that it's not too distant for SMW to inherit these qualities, but I'd like to understand the overall picture. Yes, agreed. Of course we have always designed basic algorithms with regards to performance and scalability, and especially tried to pick features based on this aspect. On the other hand, caching is significantly under-developed in SMW as it is, since it mainly uses the existing MW caches where applicable. There are various types of operations that are relevant to performance, and each can probably be optimised/cached independently: (1) Basic page display -- by far the most common operation. (2) Query answering, inline and on Special:Ask (3) Annotation parsing and page formatting. (4) Maintenance specials such as Special:Properties. (5) OWL/RDF export. (6) Browsing special Special:Browse I will sketch performance issues for each of those. For actual numbers, see http://ontoworld.org/profileinfo.php to find out how severe each operation is on ontoworld.org. (1) is clearly the main operation, and for existing pages SMW merely uses MW's parser/page caches. No mechanism for cache invalidation exists, but MW regularly updates page caches. This allows outdated inline queries but gives us good hope for basic scalability in large environments. Especially SMW does not hook into any operations that happen when reproducing parser cached pages. Even the Factbox comes from the parser cache (which is why we cannot readily translate it to the user's language as MW does for categories). (2) Query answering is done without any caching, and this is clearly a problem. While inline queries are computed only once and stored in the parser cache afterwards, Special:Ask has no caching facility at all. This needs to change in the future. Targetted
Re: [SMW-devel] [PATCH] Support LIKE in queries
^_^ ok, I thought we escaped with a \, which isn't something that normal users would find easy to use. But a starting space escape is ok. I still would pick ~ as the best thing for use of REGEX and prefer a different operator for wild cards I guess the % is probably best for the wild card operator. Which brings me the thought of: EQ:[[property::value]] NEQ: [[property::!value]] GT:[[property::value]] LT:[[property::value]] WILD: [[property::%value]] (Using ? and *) Also, I propose a few more additions since they will probably have some good use to. GTEQ: [[property::=value]] LTEQ: [[property::=value]] NWILD: [[property::!%value]] (Negated wild card) REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could of course be replaced with !, [], etc... any valid in preg. NGT: [[property::#value]] (Natural order greater than) NLT: [[property::#value]] (Natural order less than) NGTEQ: [[property::#=value]] (Natural order greater than or equal to) NLTEQ: [[property::#=value]] (Natural order less than or equal to) Of course, the REGEX one is provided that we can fix the issue of colliding with Halo. But on note of that negated wild card. I added that one for one primary reason. Unlike any of the other things, you cannot negate a wild card with any other format. ( can be negated with =, eq with !, and regex can negate things inside of it. But you can't negate a wild card) Also, remember to escape things so that we can use (\* and \? to use those literally; I could draft all the replaces needed, but I got to go do something first) As for the Natural order ones, if you don't know what those are for, it's things like values of 1.2.3 and 1.12.3. Using a normal it thinks that 1.2.3 is greater than 1.12.3 because the third character is a two and the third character in the other is a 1. But a natural order properly distinguishes the second number as 12. PHP has functions for these built in and would be nice for use. ~Daniel Friesen(Dantman) of: -The Gaiapedia (http://gaia.wikia.com) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) -and Wiki-Tools.com (http://wiki-tools.com) Markus Krötzsch wrote: On Samstag, 29. Dezember 2007, DanTMan wrote: A lot of people are accustomed to the ? (single-character match) and * (multi-character match) format. It would be easy to escape the '_'s and '%'s in a match and then do a replace of ? to _ and * to %. (A little preg and \ could still easily escape those.) Yes, I agree to that. I think, if nobody objects, this fixes the pattern syntax. So it remains to find a good symbol for the comparator. I don't know about ~ though, in the languages I've used I recall ~ having something to do with regex. I'd rather save that character for in case we want to be able to use the REGEXP matching inside of SQL. From what I remember, I think most people with only a little insight into technical stuff, would adjust easiest to using this set: = Equals Greater than = Greater than or equal to Less than or equal to ! Not * Multi-character match ? Single-character match ~ regex As a note: = is not available in parser function #ask, since it has a special meaning as parameter assignment, as e.g. in format=table. The query is distinguished from the other parameters and print requests in #ask since it has no = symbol and does not start on ?. But I did have a thought about the @... It's not used anywhere afaik. I did make a suggestion on using a pattern to separate the comparators from the match value. It was using [[Property::comparitor::match]], but as I now remember SMW lets you use :: to specify multiple properties. However it may be a good idea if the separator was one which wouldn't cause conflicting issues with other things. Maybe I should remark that the comparator we chose will never block any symbol from being used in values. You can always escape the initial comparator by inserting an initial space (which is ignored in all values). For instance, to look for pages with property value strange value, one could write [[some property:: strange value]] whereas [[some property::strange value]] would be equivalent to [[some property:: strange value]] which matches all values (alphabetically) smaller than strange value. So we can pick any comparator letter without conflicts. @ is not commonly used and does provide a little bit of a way for people to understand it's use. Or if you want a little farther from what can actually be used in a title (To avoid clashing with things) the # is always invalid. Say, [[prop::[EMAIL PROTECTED] or [[prop::comp#match]]. So for a not [[Has value::[EMAIL PROTECTED] or [[Has value::!#Value]]. Basically, spaces already play the role of your proposed @ or #. I'm probably droning on now... But what about finding a good separator and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought
Re: [SMW-devel] {{#ask}}
I'm not sure if restricting Ask functionality is along the lines of Wikipedia policies - it's not a modification operation therefore it should be public, I believe. I agree, that abuse bocking and request throttling might be a solution here, but in general, I wouldn't recommend restriction of access, but a functionality instead, e.g. limited amount of joins or something like that. This kind of actions is generally hard to limit and predict therefore it's quite easy to abuse. This might be a serious bottleneck in SMW adoption by Wikipedia. Sergey On Dec 29, 2007 6:01 AM, cnit [EMAIL PROTECTED] wrote: What I meant was: a simple cron-job can touch LocalSettings.phpregularly to purge the MW cache globally. Not much interaction with MW needed for that. Yes, that's simple. I guess a strong solution for that will still take some time. One could of course store inline queries in some table, use IDs for each, and permit anyone to use ask with such an (internal) ID only, whereas making custom queries would require further permissions. But this is some more code, and I am not entirely convinced of that design. Did you experience problems with anonymous users that access Special:Ask? On ontoworld it seems that a significant amount of Special:Ask requests really come from further results links. Still not. It might show up later, when SMW will become common. But maybe you're right that it doesn't worth the efforts. Dmitriy - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] SMW Performance
Hmm. I didn't realize there is a way to remove $smwgQDefaultNamespaces restriction and this will enable all namespaces instead of disabling them. Why is it that this setting not set to NULL by default then? I don't see any point in restricting namespaces unless it's absolutely necessary for security reasons or something. Sergey On Dec 29, 2007 10:29 AM, Markus Krötzsch [EMAIL PROTECTED] wrote: On Freitag, 28. Dezember 2007, Lau, William (NIH/CIT) [E] wrote: We have a set of semantic queries in a template. That template is used in some pages. However, by looking at the database process list, it seems that those set of queries are processed whenever a page is requested, even when the template is not used by the requested page (e.g. special pages). Do I understand this correctly? The delivery of special pages that are completely unrelated to said template triggers the ask-queries contained therein? This would be very strange behaviour indeed (I cannot currently imagine how or why this should happen in MediaWiki)! All the SQL queries are generated by the getQueryResult function. Since those queries are very computational intensive, this bug slows down the entire site. If we take the inline queries out of the template or change $smwgQEnabled to false, the site becomes fast again. Has anyone experienced the same issue? In general, if queries on some site are too slow, it is useful to configure SMW to support faster querying (with less features, of course). Basic settings one can try to speed up querying are: include_once('extensions/SemanticMediaWiki/includes/SMW_Settings.php'); $smwgQSubcategoryDepth = 0; $smwgQSubpropertyDepth = 0; $smwgQEqualitySupport = SMW_EQ_NONE; $smwgQDefaultNamespaces = NULL; enableSemantics(semedia-wiki.localhost); Those settings will speed up basically all queries, disabling all support for property and category hierarchies, equality (redirects), and namespace restrictions (i.e. queries consider pages in all namespaces, including, e.g., User:). You can experiment which of those, if any, affects your query performance positively. If you have problems with too complex user-generated queries, then the parameters $smwgQMaxSize and $smwgQMaxDepth are an option to restrict this. In general, it should be emphasised that queries should be used in a targetted way. Ontoworld.org had the infamous template {{ask}} for some time, which included queries for almost anything, which would just not appear if no results would be obtained. Most wikis should rather have single query templates for special purposes instead of trying to have one for all. Anyway, for further optimisation, we need some pointer to your site, or at least some statistical information concerning its size (Special:SemanticStatistics) and the query structure. Did you mention the SMW version you use? Some of the above assume SMW1.0-RC3, and none will work prior to SMW1.0-RC1. Markus -- Markus Krötzsch Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe phone +49 (0)721 608 7362fax +49 (0)721 608 5998 [EMAIL PROTECTED]www http://korrekt.org - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel -- Sergey Chernyshev http://www.sergeychernyshev.com/ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel