Re: [SMW-devel] RFC: Value distribution support in result formats
Hey, modify the query results to contain only the labels and the numbers This was actually the first approach I considered and implemented to some extend. However, the query result object is really not made to be used like this, and the ways to get around of this where just to much of a hack, which is why I decided to go with the current approach. Having some more generic mechanism that does not require QPs to care about what post processing is happening at all would be nice, but would require rewriting the query result class or going with some messed up architecture. Either way, it's a bunch of work, which although I agree would be useful, is not something I'm going to take on now. If you or someone else wants to have a go at it, please do, I'll be happy to help review it if needed. What I implemented should be seen as a way for query printers to support value distribution behaviour without all of them reinventing the wheel, not a generic post processing system. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. -- -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
[SMW-devel] RFC: Value distribution support in result formats
Hey all, I have implemented general support for value distributions in result formats in SMW. This email explains this feature and is meant to gather feedback on it before SMW 1.7 is released. == Goal == Allow visualizing how many times each value in a result occurs, ie allow for creating value distributions. For example, this result set: foo bar baz foo bar bar ohi Will be turned into * bar (3) * foo (2) * baz (1) * ohi (1) This can then be displayed in chart formats, with the value as label and the occurrence count as value. Although the most obvious use for this are charts, it can really be used with any format. == Current implementation: how to use it == Each format needs to add support for this functionality before you'll be able to use it to visualize value distributions. Right now only jqplotbar and jqplotpie make use of it. All formats that support this functionality accept 3 additional parameters: * distribution (on/off) - if a value distribution should be calculated and shown instead of the regular results. * distributionsort (asc/desc/none) - the sort of the values, by occurance count. * distributionlimit (positive whole number) - the max amount of values to visualize. This example will get the countries the matching cities are located in, count the occurance of each, and display this as a pie chart. Note the use of the mainlabel parameter. If this is not done, the cities themselves will also be put into the value distribution. {{#ask: [[Category:Locations]] [[Has location type::City]] | ?Located in | format=jqplotpie | distribution=on | mainlabel=- | limit=500 }} This example will do the same query, but will only show the 10 countries with most matching cities, in descending order. {{#ask: [[Category:Locations]] [[Has location type::City]] | ?Located in | format=jqplotpie | distribution=on | distributionsort=desc | distributionlimit=10 | mainlabel=- | limit=500 }} You can see these examples and 2 others working on the mapping documentation wiki, making use of the example semantic data there: http://mapping.referata.com/wiki/Value_distribution_examples == Implementation details (technical) == After looking into several options I decided to implement this as a result printer class deriving from SMWResultPrinter, requiring changes to each format that wants to support this behaviour, but making this relatively easy. This approach seems like a good balance between making this functionality available as easy as possible and staying sane. This class is called SMWDistributablePrinter and can be found here: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticMediaWiki/includes/queryprinters/SMW_QP_Distributable.php?view=markup Example jqplotpie implementation: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticResultFormats/jqPlot/SRF_jqPlotPie.php?view=markup == Request for comments == Feedback is welcome. The main question for users is what names the parameters should use. Right now they all start with distribution, but there might be a better (and shorter) name. From developers I'd like to know if you agree with this architecture. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. -- -- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1___ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Re: [SMW-devel] RFC: Value distribution support in result formats
Great work! I really like this feature. It has been requested many times, so it's really cool that its been done. Thanks for doing this. We briefly discussed how this could be implemented a while back, and, although the proposal back then may push the balance of implementation in the direction of insanity..., I'd like to mention that proposal again here for comparison. One way to do this would be to automatically give each property some special properties, such that the property itself could be queried for its set of unique values, and the number of times each value has been used. This is immediately tricky, because you need to link each value with its occurrence somehow (perhaps using sub-objects), but the advantage is that: 1) You don't have to modify results printers, you just pass them the results of the property query. 2) It allows you to query on the number of uses, for example, querying for all property-values that have more than 10 uses, or all values of City that have exactly 5 locations, etc. 3) It opens the way for having more meaningful property pages, automatically having unique values linked to searches, which I think is what people tend to expect. As you mentioned previously, this could be difficult to implement, but I prefer more general approaches than increasingly specific approaches to getting at the data... although I suppose the most general solution of all would be to implement aggregation queries. Thanks again for this great work that I'll definitely use heavily. I only mention the above as a point for discussion, and I expect you'll take it in that light (coming from a dumb user with no implementation experience). Cheers, Dan. On 8 November 2011 14:08, Jeroen De Dauw jeroended...@gmail.com wrote: Hey all, I have implemented general support for value distributions in result formats in SMW. This email explains this feature and is meant to gather feedback on it before SMW 1.7 is released. == Goal == Allow visualizing how many times each value in a result occurs, ie allow for creating value distributions. For example, this result set: foo bar baz foo bar bar ohi Will be turned into * bar (3) * foo (2) * baz (1) * ohi (1) This can then be displayed in chart formats, with the value as label and the occurrence count as value. Although the most obvious use for this are charts, it can really be used with any format. == Current implementation: how to use it == Each format needs to add support for this functionality before you'll be able to use it to visualize value distributions. Right now only jqplotbar and jqplotpie make use of it. All formats that support this functionality accept 3 additional parameters: * distribution (on/off) - if a value distribution should be calculated and shown instead of the regular results. * distributionsort (asc/desc/none) - the sort of the values, by occurance count. * distributionlimit (positive whole number) - the max amount of values to visualize. This example will get the countries the matching cities are located in, count the occurance of each, and display this as a pie chart. Note the use of the mainlabel parameter. If this is not done, the cities themselves will also be put into the value distribution. {{#ask: [[Category:Locations]] [[Has location type::City]] | ?Located in | format=jqplotpie | distribution=on | mainlabel=- | limit=500 }} This example will do the same query, but will only show the 10 countries with most matching cities, in descending order. {{#ask: [[Category:Locations]] [[Has location type::City]] | ?Located in | format=jqplotpie | distribution=on | distributionsort=desc | distributionlimit=10 | mainlabel=- | limit=500 }} You can see these examples and 2 others working on the mapping documentation wiki, making use of the example semantic data there: http://mapping.referata.com/wiki/Value_distribution_examples == Implementation details (technical) == After looking into several options I decided to implement this as a result printer class deriving from SMWResultPrinter, requiring changes to each format that wants to support this behaviour, but making this relatively easy. This approach seems like a good balance between making this functionality available as easy as possible and staying sane. This class is called SMWDistributablePrinter and can be found here: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticMediaWiki/includes/queryprinters/SMW_QP_Distributable.php?view=markup Example jqplotpie implementation: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticResultFormats/jqPlot/SRF_jqPlotPie.php?view=markup == Request for comments == Feedback is welcome. The main question for users is what names the parameters should use. Right now they all start with distribution, but there might be a better (and shorter) name. From developers I'd like to know if you agree with this architecture. Cheers -- Jeroen De