Re: [SMW-devel] Strange handling for special properties in SMWSQLStore3 DB design?
Hi Markus, Thanks for the additional clarifications. I now understand, at the very least, why it helps for Modification date to have its own table; and more generally, I get the design rationale. I think now that it makes sense for Semantic Forms to switch to the page_props table, and parser functions, to store its form-linking information; and if that's what it takes for SMW to not have tables for SF properties, then it's all the more reason to do it. :) By the way, I don't think additionally storing properties like Has default form would be that helpful - I don't believe people query on it, or look it up in Special:Browse, unless something goes wrong; and if parser functions like #default_form were to be used instead, it would probably be easier to know right away if there was a problem. (Also, nobody should be concerned that support for Has default form is going away - if/when this change is made, I'm sure there will be backward compatibility preserved for a long time.) Anyway, hopefully this is enough to convince you and the other relevant people to take out the SF database tables - that was the part of the new design that felt the oddest to me. -Yaron On Wed, Oct 10, 2012 at 6:44 AM, Markus Krötzsch mar...@semantic-mediawiki.org wrote: On 09/10/12 18:12, Yaron Koren wrote: Hi, Nischay - thanks for that set of links. I don't think any of them cover the thinking behind the design decisions, but they're still all good to know about. Markus - thanks for all the clarifications. I was sure that the extra tables were done to improve reading, but it sounds like they're mostly there to improve writing. I would not say that either of these is more important. Reading and writing are both relevant, and the new design should improve both activities in various situations. Is improving write performance a big deal, though? I'm no performance expert, but I would think that on an average wiki, writes are a fairly unimportant factor in the overall performance of the site. Is that incorrect? Current versions of SMW can cause a very high write load (writing on every edit, even if no data changed). This was an issue for Wikia and for anyone who is running MW on SSD drives. Just checking if some data has changed is not a solution, since the modification date always changes. Having more tables allows for a more fine-grained write control. I didn't mean that the number of tables (around 30 in SMWSQLStore3) will pose a technical challenge, just that it looks cluttered - and may pose challenges for administrators, and developers, in maintaining them. Not sure what kind of administration could be necessary. In general, users and administrators should not touch the tables. I'm still thinking about MediaWiki's page_props table - both as a design example, and as a tool that SMW and related extensions can use. As a design example: MediaWiki stores many different types of values in the page_props table. Table writes (and reads) could probably be a little faster if there were separate tables for different types of page properties, but the MediaWiki developers decided to just have one big table for everything. And as a tool, page_props could potentially be helpful for SMW, SF and others. SF's special properties, like Has default form, don't need to be queried alongside real semantic properties - and actually, I doubt they're queried often at all, other than by SF itself. SF's information could be stored in the page_props table, instead of via SMW - and then, instead of an SMW tag, you could just add something like this to a category, property or template page: {{#default_form:City}} If it reduces clutter, administrator confusion, and compatibility issues, without increasing read or write times, it might be worth it... I don't see what you mean with clutter and administrator confusion. I have never known how many tables MW uses internally, nor have the tables in MW confused me so far. We never had compatibility issues between SF and SMW so far either. But pageprops could still be a good idea for SF. It should definitely improve read performance. You can also store the data redundantly in SMW and in PageProps, so you get the best of both (fast access, #ask, Special:Browse, data export). If you do this, we would probably make the SF properties less special in our code, since they would not be needed so heavily. Markus -- WikiWorks · MediaWiki Consulting · http://wikiworks.com -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev___ Semediawiki-devel mailing list
Re: [SMW-devel] Strange handling for special properties in SMWSQLStore3 DB design?
On 10/10/12 14:25, Yaron Koren wrote: Hi Markus, Thanks for the additional clarifications. I now understand, at the very least, why it helps for Modification date to have its own table; and more generally, I get the design rationale. I think now that it makes sense for Semantic Forms to switch to the page_props table, and parser functions, to store its form-linking information; and if that's what it takes for SMW to not have tables for SF properties, then it's all the more reason to do it. :) If you don't want tables for the SF properties, then we can also do this without you moving to page props. But I don't think that this would be better, or at least we should analyse the situation based on actual DB load and not based on perceived oddness level ;-). By the way, I don't think additionally storing properties like Has default form would be that helpful - I don't believe people query on it, or look it up in Special:Browse, unless something goes wrong; and if parser functions like #default_form were to be used instead, it would probably be easier to know right away if there was a problem. Sure, you know the SF users best. (Also, nobody should be concerned that support for Has default form is going away - if/when this change is made, I'm sure there will be backward compatibility preserved for a long time.) Anyway, hopefully this is enough to convince you and the other relevant people to take out the SF database tables - that was the part of the new design that felt the oddest to me. There are various levels of special handling for the SF properties right now, all of them hardcoded. You should not worry about it too much. Removing the special properties would make the form properties normal page-type properties; I am not sure that this will have a good impact on performance. The original reason for SMW to introduce special handling for SF properties was that SF makes very frequent requests to these properties. As long as this is still the case, at least some special handling (fixed property ID, fixed property type) should be preserved. Whether it is better to have many tables or one for the properties depends on how often you query each of the properties, how many there are (now we have two), and also how many other page-type property values pages typically have. I suggest to simply ignore the issue for now and to remove all mentioning of SF properties as soon as SF has no such properties any more. Markus On Wed, Oct 10, 2012 at 6:44 AM, Markus Krötzsch mar...@semantic-mediawiki.org mailto:mar...@semantic-mediawiki.org wrote: On 09/10/12 18:12, Yaron Koren wrote: Hi, Nischay - thanks for that set of links. I don't think any of them cover the thinking behind the design decisions, but they're still all good to know about. Markus - thanks for all the clarifications. I was sure that the extra tables were done to improve reading, but it sounds like they're mostly there to improve writing. I would not say that either of these is more important. Reading and writing are both relevant, and the new design should improve both activities in various situations. Is improving write performance a big deal, though? I'm no performance expert, but I would think that on an average wiki, writes are a fairly unimportant factor in the overall performance of the site. Is that incorrect? Current versions of SMW can cause a very high write load (writing on every edit, even if no data changed). This was an issue for Wikia and for anyone who is running MW on SSD drives. Just checking if some data has changed is not a solution, since the modification date always changes. Having more tables allows for a more fine-grained write control. I didn't mean that the number of tables (around 30 in SMWSQLStore3) will pose a technical challenge, just that it looks cluttered - and may pose challenges for administrators, and developers, in maintaining them. Not sure what kind of administration could be necessary. In general, users and administrators should not touch the tables. I'm still thinking about MediaWiki's page_props table - both as a design example, and as a tool that SMW and related extensions can use. As a design example: MediaWiki stores many different types of values in the page_props table. Table writes (and reads) could probably be a little faster if there were separate tables for different types of page properties, but the MediaWiki developers decided to just have one big table for everything. And as a tool, page_props could potentially be helpful for SMW, SF and others. SF's special
Re: [SMW-devel] Strange handling for special properties in SMWSQLStore3 DB design?
On 08/10/12 15:44, Yaron Koren wrote: Hi, I'm finally trying out SMWSQLStore3, and figuring out how to get my extensions compatible with it, so I'm looking at the new table structure for the first time. First of all, is there any documentation about the design decisions that went into this new structure? Because it could be that this issue has been answered already. Hi Yaron, I don't think we have a comprehensive documentation yet (Nischay may correct me though). You already have a good idea of certain things; I will try to further clarify this below (I think there are some misunderstandings). If not - basically, the way the new database structure seem to work is that there's a separate table for each special property. That is not the case. You are right that the new structure supports tables that are used for one property only. This has some advantages: * The property is determined from the table; no need to store the property in each row -- less storage space, smaller indexes, less memory * Changes for that property can be written independently of changes for other properties. For example, modification date changes on each edit, yet we do not need to update other properties/tables on each edit -- reduced write activity * The table is smaller and more specific than general-purpose property-value tables. This gives MySQL (or any DB) a better chance for guessing selectivity when doing join optimisations in queries. -- faster query execution There is a limit on how many tables a DBMS is happy with, but 30 tables are not a problem. Reading all data for one subject can become slower if the data resides in many tables, but reading data for a particular property should be faster, esp. if the respective property is used a lot (small tables fit into memory). Now to your original question: it is not the case that all special properties have their own tables. The two things are independent: special properties can be stored in the common catch-all tables and normal properties can have their own table. It is just that SMW by default provides special tables for its own special properties. This includes SF properties that are heavily used (this dependency between SMW and SF has been there in similar form for many versions; properties used by SF all the time [on every page build] always had their datatype hardcoded in SMW). There is a mechanism for users to create tables for important properties at their own discretion. I am not sure if this is documented yet. Creation date and Modification date both have one, as do Has improper value for, and probably some others. Interestingly, there are also tables for the Semantic Forms special properties Has default form and Has alternate form. This strikes me as strange design, for a few reasons: - I assume that this is done to speed up querying; but, as far as I know, many of SMW's special properties - like Creation date and the rest - are rarely queried on. See above. Querying is only one aspect. Putting rarely used/rarely updated data into its own table allows this data to be disregarded in many cases, thus taking load off the rest of the DB. - This could lead to an explosion of database tables - it looks like there are about 30 in the new structure, which some might consider an explosion already, and if there are a bunch more special properties added for metadata (like last author, first author, etc.) the number could just keep growing indefinitely. 30 should not be a problem for any DBMS, and tables are not added automatically when special properties are added. - It's understandable that Semantic Forms would get special handling, but still, having one extension handle things for another introduces dependency issues. What if Semantic Forms got other special properties? Or what if, say, Has alternate form were removed? It could potentially lead to compatibility problems. We can think about how to reduce this dependency. SF could probably control the property tables it wants to use by itself. However, I would recommend having tables for all of its frequently used properties. - On that note, SF already has two special properties that don't have their own table - Page has default form and Creates pages with form, both of which can apply to forms just like Has default form and Has alternate form do. Not that I'm suggesting that SMW should get two more tables for these, but on the other hand, I believe SF queries on these on a fairly regular basis. Then it would probably be good to have extra tables. So what could be done instead? I can think of a few options: - Store all properties, special and otherwise, in the same set of tables, and make better use of indexing to speed up queries. We already make better use of indexing anyway, to the extent that we know how ;-). But MySQL is mainly using table-based selectivity measures, so that join optimisation is not very good if everything is