Re: [SMW-devel] {{#ask}}

2007-12-29 Thread cnit
 What I meant was: a simple cron-job can touch LocalSettings.php regularly to
 purge the MW cache globally. Not much interaction with MW needed for that.
Yes, that's simple.

 I guess a strong solution for that will still take some time. One could of
 course store inline queries in some table, use IDs for each, and permit
 anyone to use ask with such an (internal) ID only, whereas making custom
 queries would require further permissions. But this is some more code, and I
 am not entirely convinced of that design.

 Did you experience problems with anonymous users that access Special:Ask? On
 ontoworld it seems that a significant amount of Special:Ask requests really
 come from further results links.
Still not. It might show up later, when SMW will become common.
But maybe you're right that it doesn't worth the efforts.
Dmitriy


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] [PATCH] Support LIKE in queries

2007-12-29 Thread Markus Krötzsch
On Samstag, 29. Dezember 2007, DanTMan wrote:
 A lot of people are accustomed to the ? (single-character match) and *
 (multi-character match) format. It would be easy to escape the '_'s and
 '%'s in a match and then do a replace of ? to _ and * to %. (A little
 preg and \ could still easily escape those.)

Yes, I agree to that. I think, if nobody objects, this fixes the pattern 
syntax. So it remains to find a good symbol for the comparator.

 I don't know about ~ though, in the languages I've used I recall ~
 having something to do with regex. I'd rather save that character for in
 case we want to be able to use the REGEXP matching inside of SQL.

  From what I remember, I think most people with only a little insight
 into technical stuff, would adjust easiest to using this set:
 = Equals

   Greater than
  = Greater than or equal to

  Less than or equal to
 ! Not
 * Multi-character match
 ? Single-character match
 ~ regex

As a note: = is not available in parser function #ask, since it has a 
special meaning as parameter assignment, as e.g. in format=table. The query 
is distinguished from the other parameters and print requests in #ask since 
it has no = symbol and does not start on ?.


 But I did have a thought about the @... It's not used anywhere afaik.
 I did make a suggestion on using a pattern to separate the comparators
 from the match value. It was using [[Property::comparitor::match]], but
 as I now remember SMW lets you use :: to specify multiple properties.
 However it may be a good idea if the separator was one which wouldn't
 cause conflicting issues with other things. 

Maybe I should remark that the comparator we chose will never block any symbol 
from being used in values. You can always escape the initial comparator by 
inserting an initial space (which is ignored in all values). For instance, to 
look for pages with property value strange value, one could write 

[[some property:: strange value]]

whereas [[some property::strange value]] would be equivalent to 

[[some property:: strange value]]

which matches all values (alphabetically) smaller than strange value. So we 
can pick any comparator letter without conflicts.

 @ is not commonly used and 
 does provide a little bit of a way for people to understand it's use. Or
 if you want a little farther from what can actually be used in a title
 (To avoid clashing with things) the # is always invalid.
 Say, [[prop::[EMAIL PROTECTED] or [[prop::comp#match]]. So for a not [[Has
 value::[EMAIL PROTECTED] or [[Has value::!#Value]].

Basically, spaces already play the role of your proposed @ or #.

 I'm probably droning on now... But what about finding a good separator
 and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought
 of),LT[], GT[], REGEX(P)[~], LIKE[%_], wildcard[*?], etc...

Not sure whether that would be better internationally.  seems to be more 
universally understood than LT.

Another remark: ::! stands for inequality (NEQ), not for negation (NOT). It 
looks for pages that have some property value unequal to the one that was 
given, and it does not matter whether or not they also have some value that 
is equal. So a page that is annotated with [[property::1]] and 
[[property::2]] would match a query atom [[property::!1]].

 There also is the possibility of instead of a separator, using brackets
 to encompass a comparator. I can hardly think of many places which would
 use (NOT) at the start of a title ([[Has value::(NOT) Title]]) or, we
 also have the {} and [] type brackets. [] is used by external links, but
 {} is only used in multiples as a template or variable bit but never has
 use singularly, templates and values will have already been parsed out
 so only the singles remain, and as a bonus, { and } are illegal in
 titles. So [[Has value::{NOT} Title]] is guaranteed to never clash with
 a legal title or match you can make. If you're worried about templates
 and parsing issues, those can't occur when your using something like
 {{{1}}} as the title ([[Has value:{NOT} {{{1}}}]]) so there's no clash.
 The only potential class is if someone wants to use {{{comparator|EQ}}}
 to specify the comparator. In that case, we could easily make { EQ }
 valid (trim spaces), so { {{{comparator|EQ}}} } would work.

Yes, that would work too. But I am happy with our spaces (the fact that 
initial and trailing spaces are ignored in all property values is the key to 
make that work, and I think there is no harm in assuming that).

There is, in principle, no problem with having multi-char sequences for 
comparators, but I would prefer something that does not require 
internationalisation. So, given that we use * and ? instead of % and _, there 
are the following options:

1-  [[property::%*substring*]]
2-  [[property::#*substring*]]
3-  [[property::~*substring*]] (clashes with Halo)
4-  [[property::@*substring*]]
5-  maybe more ...

My order of preference would be 3, 1, 4, 2, and I opt for 1 due to the Halo 
issue. Further 

Re: [SMW-devel] SMW Performance

2007-12-29 Thread Markus Krötzsch
On Freitag, 28. Dezember 2007, Lau, William (NIH/CIT) [E] wrote:
 We have a set of semantic queries in a template. That template is used
 in some pages. However, by looking at the database process list, it
 seems that those set of queries are processed whenever a page is
 requested, even when the template is not used by the requested page
 (e.g. special pages). 

Do I understand this correctly? The delivery of special pages that are 
completely unrelated to said template triggers the ask-queries contained 
therein? This would be very strange behaviour indeed (I cannot currently 
imagine how or why this should happen in MediaWiki)!

 All the SQL queries are generated by the 
 getQueryResult function. Since those queries are very computational
 intensive, this bug slows down the entire site. If we take the inline
 queries out of the template or change $smwgQEnabled to false, the site
 becomes fast again. Has anyone experienced the same issue?

In general, if queries on some site are too slow, it is useful to configure 
SMW to support faster querying (with less features, of course). Basic 
settings one can try to speed up querying are:

include_once('extensions/SemanticMediaWiki/includes/SMW_Settings.php');
$smwgQSubcategoryDepth = 0;
$smwgQSubpropertyDepth = 0;
$smwgQEqualitySupport = SMW_EQ_NONE;
$smwgQDefaultNamespaces = NULL;
enableSemantics(semedia-wiki.localhost);

Those settings will speed up basically all queries, disabling all support for 
property and category hierarchies, equality (redirects), and namespace 
restrictions (i.e. queries consider pages in all namespaces, including, e.g., 
User:). You can experiment which of those, if any, affects your query 
performance positively.

If you have problems with too complex user-generated queries, then the 
parameters $smwgQMaxSize and $smwgQMaxDepth are an option to restrict this.

In general, it should be emphasised that queries should be used in a targetted 
way. Ontoworld.org had the infamous template {{ask}} for some time, which 
included queries for almost anything, which would just not appear if no 
results would be obtained. Most wikis should rather have single query 
templates for special purposes instead of trying to have one for all.

Anyway, for further optimisation, we need some pointer to your site, or at 
least some statistical information concerning its size 
(Special:SemanticStatistics) and the query structure. Did you mention the SMW 
version you use? Some of the above assume SMW1.0-RC3, and none will work 
prior to SMW1.0-RC1.

Markus


-- 
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362fax +49 (0)721 608 5998
[EMAIL PROTECTED]www  http://korrekt.org


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] Performance: (Was: {{#ask}})

2007-12-29 Thread Markus Krötzsch
On Montag, 17. Dezember 2007, Sergey Chernyshev wrote:
 Thank you, Markus - it's a really good review! I wonder if there is any way
 to unify performance reporting for all SMW instances so we can compare the
 effects of large data sets, different systems configs (e.g. disabled cache
 and so on) - just looked at profileinfo.php script, it might be an answer,
 actually.

 I wonder if real Wikipedia set of data (outdated, maybe) is going to be set
 up as a test-case for SMW to handle (with Semantic Templates, of course) -
 I was going to do that, but don't have resources for this. This might help
 to make the goal of Semantic Wikipedia more transparent.

In fact we have such a site, but it runs on a rather unstable hardware (we 
have a buggy RAID controller or driver :-(). It is our test server at 
test.ontoworld.org, which also was used for other experiments and is not in 
perfect shape right now (and querying was disabled in order to not impair 
other experiments). We might set up another more recent Wikipedia copy 
sometime in some future.


 Since we're talking about performance, there is another side of performance
 tuning - percepted performance, this mostly concerns javascripts, css and
 so on - for example there is still a problem of SIMILE Timeline not being
 that fast to load (although performance of pages that didn't have it
 improved now, when client-side the code is loaded only on pages that need
 it). This kind of issues can be tracked using Firebog with Yahoo's YSlow
 add-on.

True, and I hope Timeline is really the main performance problem there. I 
wonder whether we could ship a more stripped down version of the scripts to 
decrease load time. I guess we should ask the guys over at SAIL for that ...


 I'll be happy to run the tests on the system with significant amount of
 data if you need a testbed.

All profiling support is appreciated, but I am not sure how to operationalise 
testing on our servers (SQL profiling would probably need server access, 
which is not possible in this case). Insights on JavaScript performance are 
also useful, but I guess that MySQL tuning could be most important for 
approaching large sites. I you have know about DB optimisation, you can also 
have a look at our DB layout and at the SQL queries we generate 
(format=debug).

Thanks,

Markus


 On Dec 16, 2007 8:56 AM, Markus Krötzsch [EMAIL PROTECTED] wrote:
  On Freitag, 14. Dezember 2007, Sergey Chernyshev wrote:
   Got it - if it'll speed up the process, that'll be great. Currently SMW
 
  on
 
   top of MW runs significantly slower then just MW which is not very good
   because it means that SMW+MW can't scale as good as MW alone.
  
   Can you describe in a couple of paragraphs how SMW data and queries are
   getting cached and how that cache is being invalidated, what works on
 
  the
 
   fly and what is served from parser cache.
  
   I understand it's a lot to describe, but projects with massive amount
   of data and traffic, performance can be a big show-stopper - we picked
   MW
 
  for
 
   one of our projects because of Wikipedia performance example and
   predictability and I hope that it's not too distant for SMW to inherit
   these qualities, but I'd like to understand the overall picture.
 
  Yes, agreed. Of course we have always designed basic algorithms with
  regards
  to performance and scalability, and especially tried to pick features
  based
  on this aspect. On the other hand, caching is significantly
  under-developed
  in SMW as it is, since it mainly uses the existing MW caches where
  applicable. There are various types of operations that are relevant to
  performance, and each can probably be optimised/cached independently:
 
  (1) Basic page display -- by far the most common operation.
  (2) Query answering, inline and on Special:Ask
  (3) Annotation parsing and page formatting.
  (4) Maintenance specials such as Special:Properties.
  (5) OWL/RDF export.
  (6) Browsing special Special:Browse
 
  I will sketch performance issues for each of those. For actual numbers,
  see
  http://ontoworld.org/profileinfo.php to find out how severe each
  operation is
  on ontoworld.org.
 
  (1) is clearly the main operation, and for existing pages SMW merely uses
  MW's
  parser/page caches. No mechanism for cache invalidation exists, but MW
  regularly updates page caches. This allows outdated inline queries but
  gives
  us good hope for basic scalability in large environments.  Especially SMW
  does not hook into any operations that happen when reproducing parser
  cached
  pages. Even the Factbox comes from the parser cache (which is why we
  cannot
  readily translate it to the user's language as MW does for categories).
 
  (2) Query answering is done without any caching, and this is clearly a
  problem. While inline queries are computed only once and stored in the
  parser
  cache afterwards, Special:Ask has no caching facility at all. This needs
  to
  change in the future. Targetted 

Re: [SMW-devel] [PATCH] Support LIKE in queries

2007-12-29 Thread DanTMan
^_^ ok, I thought we escaped with a \, which isn't something that normal
users would find easy to use. But a starting space escape is ok.

I still would pick ~ as the best thing for use of REGEX and prefer a
different operator for wild cards
I guess the % is probably best for the wild card operator. Which brings
me the thought of:

EQ:[[property::value]]
NEQ:   [[property::!value]]
GT:[[property::value]]
LT:[[property::value]]
WILD:  [[property::%value]] (Using ? and *)

Also, I propose a few more additions since they will probably have some
good use to.

GTEQ:  [[property::=value]]
LTEQ:  [[property::=value]]
NWILD: [[property::!%value]] (Negated wild card)
REGEX: [[property::~value]] or perhaps [[property::~/value/i]] (/ could 
of course be replaced with !, [], etc... any valid in preg.
NGT:   [[property::#value]] (Natural order greater than)
NLT:   [[property::#value]] (Natural order less than)
NGTEQ: [[property::#=value]] (Natural order greater than or equal to)
NLTEQ: [[property::#=value]] (Natural order less than or equal to)

Of course, the REGEX one is provided that we can fix the issue of
colliding with Halo.
But on note of that negated wild card. I added that one for one primary
reason. Unlike any of the other things, you cannot negate a wild card
with any other format. ( can be negated with =, eq with !, and regex
can negate things inside of it. But you can't negate a wild card) Also,
remember to escape things so that we can use (\* and \? to use those
literally; I could draft all the replaces needed, but I got to go do
something first)
As for the Natural order ones, if you don't know what those are for,
it's things like values of 1.2.3 and 1.12.3. Using a normal  it
thinks that 1.2.3 is greater than 1.12.3 because the third character
is a two and the third character in the other is a 1. But a natural
order properly distinguishes the second number as 12. PHP has functions
for these built in and would be nice for use.

~Daniel Friesen(Dantman) of:
-The Gaiapedia (http://gaia.wikia.com)
-Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG)
-and Wiki-Tools.com (http://wiki-tools.com)

Markus Krötzsch wrote:
 On Samstag, 29. Dezember 2007, DanTMan wrote:
   
 A lot of people are accustomed to the ? (single-character match) and *
 (multi-character match) format. It would be easy to escape the '_'s and
 '%'s in a match and then do a replace of ? to _ and * to %. (A little
 preg and \ could still easily escape those.)
 

 Yes, I agree to that. I think, if nobody objects, this fixes the pattern 
 syntax. So it remains to find a good symbol for the comparator.

   
 I don't know about ~ though, in the languages I've used I recall ~
 having something to do with regex. I'd rather save that character for in
 case we want to be able to use the REGEXP matching inside of SQL.

  From what I remember, I think most people with only a little insight
 into technical stuff, would adjust easiest to using this set:
 = Equals

   Greater than
  = Greater than or equal to

  Less than or equal to
 ! Not
 * Multi-character match
 ? Single-character match
 ~ regex
 

 As a note: = is not available in parser function #ask, since it has a 
 special meaning as parameter assignment, as e.g. in format=table. The query 
 is distinguished from the other parameters and print requests in #ask since 
 it has no = symbol and does not start on ?.

   
 But I did have a thought about the @... It's not used anywhere afaik.
 I did make a suggestion on using a pattern to separate the comparators
 from the match value. It was using [[Property::comparitor::match]], but
 as I now remember SMW lets you use :: to specify multiple properties.
 However it may be a good idea if the separator was one which wouldn't
 cause conflicting issues with other things. 
 

 Maybe I should remark that the comparator we chose will never block any 
 symbol 
 from being used in values. You can always escape the initial comparator by 
 inserting an initial space (which is ignored in all values). For instance, to 
 look for pages with property value strange value, one could write 

 [[some property:: strange value]]

 whereas [[some property::strange value]] would be equivalent to 

 [[some property:: strange value]]

 which matches all values (alphabetically) smaller than strange value. So 
 we 
 can pick any comparator letter without conflicts.

   
 @ is not commonly used and 
 does provide a little bit of a way for people to understand it's use. Or
 if you want a little farther from what can actually be used in a title
 (To avoid clashing with things) the # is always invalid.
 Say, [[prop::[EMAIL PROTECTED] or [[prop::comp#match]]. So for a not [[Has
 value::[EMAIL PROTECTED] or [[Has value::!#Value]].
 

 Basically, spaces already play the role of your proposed @ or #.

   
 I'm probably droning on now... But what about finding a good separator
 and allowing textual names ie: EQ[=], NOT/NEQ/[!] (!= could be thought
 

Re: [SMW-devel] {{#ask}}

2007-12-29 Thread Sergey Chernyshev
I'm not sure if restricting Ask functionality is along the lines of
Wikipedia policies - it's not a modification operation therefore it should
be public, I believe.

I agree, that abuse bocking and request throttling might be a solution here,
but in general, I wouldn't recommend restriction of access, but a
functionality instead, e.g. limited amount of joins or something like that.

This kind of actions is generally hard to limit and predict therefore it's
quite easy to abuse. This might be a serious bottleneck in SMW adoption by
Wikipedia.

   Sergey


On Dec 29, 2007 6:01 AM, cnit [EMAIL PROTECTED] wrote:

  What I meant was: a simple cron-job can touch LocalSettings.phpregularly to
  purge the MW cache globally. Not much interaction with MW needed for
 that.
 Yes, that's simple.

  I guess a strong solution for that will still take some time. One could
 of
  course store inline queries in some table, use IDs for each, and permit
  anyone to use ask with such an (internal) ID only, whereas making custom
  queries would require further permissions. But this is some more code,
 and I
  am not entirely convinced of that design.

  Did you experience problems with anonymous users that access
 Special:Ask? On
  ontoworld it seems that a significant amount of Special:Ask requests
 really
  come from further results links.
 Still not. It might show up later, when SMW will become common.
 But maybe you're right that it doesn't worth the efforts.
 Dmitriy


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] SMW Performance

2007-12-29 Thread Sergey Chernyshev
Hmm. I didn't realize there is a way to remove $smwgQDefaultNamespaces
restriction and this will enable all namespaces instead of disabling them.

Why is it that this setting not set to NULL by default then? I don't see any
point in restricting namespaces unless it's absolutely necessary for
security reasons or something.

Sergey


On Dec 29, 2007 10:29 AM, Markus Krötzsch [EMAIL PROTECTED] wrote:

 On Freitag, 28. Dezember 2007, Lau, William (NIH/CIT) [E] wrote:
  We have a set of semantic queries in a template. That template is used
  in some pages. However, by looking at the database process list, it
  seems that those set of queries are processed whenever a page is
  requested, even when the template is not used by the requested page
  (e.g. special pages).

 Do I understand this correctly? The delivery of special pages that are
 completely unrelated to said template triggers the ask-queries contained
 therein? This would be very strange behaviour indeed (I cannot currently
 imagine how or why this should happen in MediaWiki)!

  All the SQL queries are generated by the
  getQueryResult function. Since those queries are very computational
  intensive, this bug slows down the entire site. If we take the inline
  queries out of the template or change $smwgQEnabled to false, the site
  becomes fast again. Has anyone experienced the same issue?

 In general, if queries on some site are too slow, it is useful to
 configure
 SMW to support faster querying (with less features, of course). Basic
 settings one can try to speed up querying are:

 include_once('extensions/SemanticMediaWiki/includes/SMW_Settings.php');
 $smwgQSubcategoryDepth = 0;
 $smwgQSubpropertyDepth = 0;
 $smwgQEqualitySupport = SMW_EQ_NONE;
 $smwgQDefaultNamespaces = NULL;
 enableSemantics(semedia-wiki.localhost);

 Those settings will speed up basically all queries, disabling all support
 for
 property and category hierarchies, equality (redirects), and namespace
 restrictions (i.e. queries consider pages in all namespaces, including,
 e.g.,
 User:). You can experiment which of those, if any, affects your query
 performance positively.

 If you have problems with too complex user-generated queries, then the
 parameters $smwgQMaxSize and $smwgQMaxDepth are an option to restrict
 this.

 In general, it should be emphasised that queries should be used in a
 targetted
 way. Ontoworld.org had the infamous template {{ask}} for some time, which
 included queries for almost anything, which would just not appear if no
 results would be obtained. Most wikis should rather have single query
 templates for special purposes instead of trying to have one for all.

 Anyway, for further optimisation, we need some pointer to your site, or at
 least some statistical information concerning its size
 (Special:SemanticStatistics) and the query structure. Did you mention the
 SMW
 version you use? Some of the above assume SMW1.0-RC3, and none will work
 prior to SMW1.0-RC1.

 Markus


 --
 Markus Krötzsch
 Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
 phone +49 (0)721 608 7362fax +49 (0)721 608 5998
 [EMAIL PROTECTED]www  http://korrekt.org

 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2005.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Semediawiki-devel mailing list
 Semediawiki-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/semediawiki-devel




-- 
Sergey Chernyshev
http://www.sergeychernyshev.com/
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel