Re: [SMW-devel] Strange handling for special properties in SMWSQLStore3 DB design?

2012-10-10 Thread Yaron Koren
Hi Markus,

Thanks for the additional clarifications. I now understand, at the very
least, why it helps for Modification date to have its own table; and more
generally, I get the design rationale. I think now that it makes sense for
Semantic Forms to switch to the page_props table, and parser functions, to
store its form-linking information; and if that's what it takes for SMW to
not have tables for SF properties, then it's all the more reason to do it.
:)

By the way, I don't think additionally storing properties like Has default
form would be that helpful - I don't believe people query on it, or look
it up in Special:Browse, unless something goes wrong; and if parser
functions like #default_form were to be used instead, it would probably be
easier to know right away if there was a problem.

(Also, nobody should be concerned that support for Has default form is
going away - if/when this change is made, I'm sure there will be backward
compatibility preserved for a long time.)

Anyway, hopefully this is enough to convince you and the other relevant
people to take out the SF database tables - that was the part of the new
design that felt the oddest to me.

-Yaron


On Wed, Oct 10, 2012 at 6:44 AM, Markus Krötzsch 
mar...@semantic-mediawiki.org wrote:

 On 09/10/12 18:12, Yaron Koren wrote:

 Hi,

 Nischay - thanks for that set of links. I don't think any of them cover
 the thinking behind the design decisions, but they're still all good to
 know about.

 Markus - thanks for all the clarifications. I was sure that the extra
 tables were done to improve reading, but it sounds like they're mostly
 there to improve writing.


 I would not say that either of these is more important. Reading and
 writing are both relevant, and the new design should improve both
 activities in various situations.


  Is improving write performance a big deal,
 though? I'm no performance expert, but I would think that on an average
 wiki, writes are a fairly unimportant factor in the overall performance
 of the site. Is that incorrect?


 Current versions of SMW can cause a very high write load (writing on every
 edit, even if no data changed). This was an issue for Wikia and for anyone
 who is running MW on SSD drives. Just checking if some data has changed is
 not a solution, since the modification date always changes. Having more
 tables allows for a more fine-grained write control.



 I didn't mean that the number of tables (around 30 in SMWSQLStore3) will
 pose a technical challenge, just that it looks cluttered - and may pose
 challenges for administrators, and developers, in maintaining them.


 Not sure what kind of administration could be necessary. In general, users
 and administrators should not touch the tables.



 I'm still thinking about MediaWiki's page_props table - both as a design
 example, and as a tool that SMW and related extensions can use. As a
 design example: MediaWiki stores many different types of values in the
 page_props table. Table writes (and reads) could probably be a little
 faster if there were separate tables for different types of page
 properties, but the MediaWiki developers decided to just have one big
 table for everything.

 And as a tool, page_props could potentially be helpful for SMW, SF and
 others. SF's special properties, like Has default form, don't need to
 be queried alongside real semantic properties - and actually, I doubt
 they're queried often at all, other than by SF itself. SF's information
 could be stored in the page_props table, instead of via SMW - and then,
 instead of an SMW tag, you could just add something like this to a
 category, property or template page:

 {{#default_form:City}}

 If it reduces clutter, administrator confusion, and compatibility
 issues, without increasing read or write times, it might be worth it...


 I don't see what you mean with clutter and administrator confusion. I have
 never known how many tables MW uses internally, nor have the tables in MW
 confused me so far. We never had compatibility issues between SF and SMW so
 far either.

 But pageprops could still be a good idea for SF. It should definitely
 improve read performance. You can also store the data redundantly in SMW
 and in PageProps, so you get the best of both (fast access, #ask,
 Special:Browse, data export). If you do this, we would probably make the SF
 properties less special in our code, since they would not be needed so
 heavily.

 Markus




-- 
WikiWorks · MediaWiki Consulting · http://wikiworks.com
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Semediawiki-devel mailing list

Re: [SMW-devel] Strange handling for special properties in SMWSQLStore3 DB design?

2012-10-10 Thread Markus Krötzsch
On 10/10/12 14:25, Yaron Koren wrote:
 Hi Markus,

 Thanks for the additional clarifications. I now understand, at the very
 least, why it helps for Modification date to have its own table; and
 more generally, I get the design rationale. I think now that it makes
 sense for Semantic Forms to switch to the page_props table, and parser
 functions, to store its form-linking information; and if that's what it
 takes for SMW to not have tables for SF properties, then it's all the
 more reason to do it. :)

If you don't want tables for the SF properties, then we can also do this 
without you moving to page props. But I don't think that this would be 
better, or at least we should analyse the situation based on actual DB 
load and not based on perceived oddness level ;-).


 By the way, I don't think additionally storing properties like Has
 default form would be that helpful - I don't believe people query on
 it, or look it up in Special:Browse, unless something goes wrong; and if
 parser functions like #default_form were to be used instead, it would
 probably be easier to know right away if there was a problem.

Sure, you know the SF users best.


 (Also, nobody should be concerned that support for Has default form is
 going away - if/when this change is made, I'm sure there will be
 backward compatibility preserved for a long time.)

 Anyway, hopefully this is enough to convince you and the other relevant
 people to take out the SF database tables - that was the part of the new
 design that felt the oddest to me.

There are various levels of special handling for the SF properties right 
now, all of them hardcoded. You should not worry about it too much.

Removing the special properties would make the form properties normal 
page-type properties; I am not sure that this will have a good impact on 
performance. The original reason for SMW to introduce special handling 
for SF properties was that SF makes very frequent requests to these 
properties. As long as this is still the case, at least some special 
handling (fixed property ID, fixed property type) should be preserved. 
Whether it is better to have many tables or one for the properties 
depends on how often you query each of the properties, how many there 
are (now we have two), and also how many other page-type property values 
pages typically have. I suggest to simply ignore the issue for now and 
to remove all mentioning of SF properties as soon as SF has no such 
properties any more.

Markus



 On Wed, Oct 10, 2012 at 6:44 AM, Markus Krötzsch
 mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org
 wrote:

 On 09/10/12 18:12, Yaron Koren wrote:

 Hi,

 Nischay - thanks for that set of links. I don't think any of
 them cover
 the thinking behind the design decisions, but they're still all
 good to
 know about.

 Markus - thanks for all the clarifications. I was sure that the
 extra
 tables were done to improve reading, but it sounds like they're
 mostly
 there to improve writing.


 I would not say that either of these is more important. Reading and
 writing are both relevant, and the new design should improve both
 activities in various situations.


 Is improving write performance a big deal,
 though? I'm no performance expert, but I would think that on an
 average
 wiki, writes are a fairly unimportant factor in the overall
 performance
 of the site. Is that incorrect?


 Current versions of SMW can cause a very high write load (writing on
 every edit, even if no data changed). This was an issue for Wikia
 and for anyone who is running MW on SSD drives. Just checking if
 some data has changed is not a solution, since the modification date
 always changes. Having more tables allows for a more fine-grained
 write control.



 I didn't mean that the number of tables (around 30 in
 SMWSQLStore3) will
 pose a technical challenge, just that it looks cluttered - and
 may pose
 challenges for administrators, and developers, in maintaining them.


 Not sure what kind of administration could be necessary. In general,
 users and administrators should not touch the tables.



 I'm still thinking about MediaWiki's page_props table - both as
 a design
 example, and as a tool that SMW and related extensions can use. As a
 design example: MediaWiki stores many different types of values
 in the
 page_props table. Table writes (and reads) could probably be a
 little
 faster if there were separate tables for different types of page
 properties, but the MediaWiki developers decided to just have
 one big
 table for everything.

 And as a tool, page_props could potentially be helpful for SMW,
 SF and
 others. SF's special 

Re: [SMW-devel] Strange handling for special properties in SMWSQLStore3 DB design?

2012-10-09 Thread Markus Krötzsch
On 08/10/12 15:44, Yaron Koren wrote:
 Hi,

 I'm finally trying out SMWSQLStore3, and figuring out how to get my
 extensions compatible with it, so I'm looking at the new table structure
 for the first time.

 First of all, is there any documentation about the design decisions that
 went into this new structure? Because it could be that this issue has
 been answered already.

Hi Yaron,

I don't think we have a comprehensive documentation yet (Nischay may 
correct me though). You already have a good idea of certain things; I 
will try to further clarify this below (I think there are some 
misunderstandings).


 If not - basically, the way the new database structure seem to work is
 that there's a separate table for each special property.

That is not the case. You are right that the new structure supports 
tables that are used for one property only. This has some advantages:

* The property is determined from the table; no need to store the 
property in each row -- less storage space, smaller indexes, less memory
* Changes for that property can be written independently of changes for 
other properties. For example, modification date changes on each edit, 
yet we do not need to update other properties/tables on each edit -- 
reduced write activity
* The table is smaller and more specific than general-purpose 
property-value tables. This gives MySQL (or any DB) a better chance for 
guessing selectivity when doing join optimisations in queries. -- faster 
query execution

There is a limit on how many tables a DBMS is happy with, but 30 tables 
are not a problem. Reading all data for one subject can become slower if 
the data resides in many tables, but reading data for a particular 
property should be faster, esp. if the respective property is used a lot 
(small tables fit into memory).

Now to your original question: it is not the case that all special 
properties have their own tables. The two things are independent: 
special properties can be stored in the common catch-all tables and 
normal properties can have their own table. It is just that SMW by 
default provides special tables for its own special properties. This 
includes SF properties that are heavily used (this dependency between 
SMW and SF has been there in similar form for many versions; properties 
used by SF all the time [on every page build] always had their datatype 
hardcoded in SMW).

There is a mechanism for users to create tables for important 
properties at their own discretion. I am not sure if this is documented yet.

 Creation date
 and Modification date both have one, as do Has improper value for,
 and probably some others. Interestingly, there are also tables for the
 Semantic Forms special properties Has default form and Has alternate
 form.

 This strikes me as strange design, for a few reasons:

 - I assume that this is done to speed up querying; but, as far as I
 know, many of SMW's special properties - like Creation date and the
 rest - are rarely queried on.

See above. Querying is only one aspect. Putting rarely used/rarely 
updated data into its own table allows this data to be disregarded in 
many cases, thus taking load off the rest of the DB.


 - This could lead to an explosion of database tables - it looks like
 there are about 30 in the new structure, which some might consider an
 explosion already, and if there are a bunch more special properties
 added for metadata (like last author, first author, etc.) the number
 could just keep growing indefinitely.

30 should not be a problem for any DBMS, and tables are not added 
automatically when special properties are added.


 - It's understandable that Semantic Forms would get special handling,
 but still, having one extension handle things for another introduces
 dependency issues. What if Semantic Forms got other special properties?
 Or what if, say, Has alternate form were removed? It could potentially
 lead to compatibility problems.

We can think about how to reduce this dependency. SF could probably 
control the property tables it wants to use by itself. However, I would 
recommend having tables for all of its frequently used properties.


 - On that note, SF already has two special properties that don't have
 their own table - Page has default form and Creates pages with form,
 both of which can apply to forms just like Has default form and Has
 alternate form do. Not that I'm suggesting that SMW should get two more
 tables for these, but on the other hand, I believe SF queries on these
 on a fairly regular basis.

Then it would probably be good to have extra tables.


 So what could be done instead? I can think of a few options:

 - Store all properties, special and otherwise, in the same set of
 tables, and make better use of indexing to speed up queries.

We already make better use of indexing anyway, to the extent that we 
know how ;-). But MySQL is mainly using table-based selectivity 
measures, so that join optimisation is not very good if everything is