Re: [SMW-devel] RFC: Value distribution support in result formats

2011-11-29 Thread Jeroen De Dauw
Hey,

 modify the query results to contain only the labels and the numbers

This was actually the first approach I considered and implemented to some
extend. However, the query result object is really not made to be used like
this, and the ways to get around of this where just to much of a hack,
which is why I decided to go with the current approach. Having some more
generic mechanism that does not require QPs to care about what post
processing is happening at all would be nice, but would require rewriting
the query result class or going with some messed up architecture. Either
way, it's a bunch of work, which although I agree would be useful, is not
something I'm going to take on now. If you or someone else wants to have a
go at it, please do, I'll be happy to help review it if needed.

What I implemented should be seen as a way for query printers to support
value distribution behaviour without all of them reinventing the wheel, not
a generic post processing system.

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


[SMW-devel] RFC: Value distribution support in result formats

2011-11-08 Thread Jeroen De Dauw
Hey all,

I have implemented general support for value distributions in result
formats in SMW. This email explains this feature and is meant to gather
feedback on it before SMW 1.7 is released.

== Goal ==

Allow visualizing how many times each value in a result occurs, ie allow
for creating value distributions.

For example, this result set: foo bar baz foo bar bar ohi
Will be turned into
* bar (3)
* foo (2)
* baz (1)
* ohi (1)

This can then be displayed in chart formats, with the value as label and
the occurrence count as value. Although the most obvious use for this are
charts, it can really be used with any format.

== Current implementation: how to use it ==

Each format needs to add support for this functionality before you'll be
able to use it to visualize value distributions. Right now only jqplotbar
and jqplotpie make use of it. All formats that support this functionality
accept 3 additional parameters:

* distribution (on/off) - if a value distribution should be calculated and
shown instead of the regular results.
* distributionsort (asc/desc/none) - the sort of the values, by occurance
count.
* distributionlimit (positive whole number) - the max amount of values to
visualize.

This example will get the countries the matching cities are located in,
count the occurance of each, and display this as a pie chart. Note the use
of the mainlabel parameter. If this is not done, the cities themselves will
also be put into the value distribution.

{{#ask: [[Category:Locations]] [[Has location type::City]]
| ?Located in
| format=jqplotpie
| distribution=on
| mainlabel=-
| limit=500
}}

This example will do the same query,  but will only show the 10 countries
with most matching cities, in descending order.

{{#ask: [[Category:Locations]] [[Has location type::City]]
| ?Located in
| format=jqplotpie
| distribution=on
| distributionsort=desc
| distributionlimit=10
| mainlabel=-
| limit=500
}}

You can see these examples and 2 others working on the mapping
documentation wiki, making use of the example semantic data there:
http://mapping.referata.com/wiki/Value_distribution_examples

== Implementation details (technical) ==

After looking into several options I decided to implement this as a result
printer class deriving from SMWResultPrinter, requiring changes to each
format that wants to support this behaviour, but making this relatively
easy. This approach seems like a good balance between making this
functionality available as easy as possible and staying sane.

This class is called SMWDistributablePrinter and can be found here:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticMediaWiki/includes/queryprinters/SMW_QP_Distributable.php?view=markup

Example jqplotpie implementation:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticResultFormats/jqPlot/SRF_jqPlotPie.php?view=markup

== Request for comments ==

Feedback is welcome. The main question for users is what names the
parameters should use. Right now they all start with distribution, but
there might be a better (and shorter) name. From developers I'd like to
know if you agree with this architecture.

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1___
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


Re: [SMW-devel] RFC: Value distribution support in result formats

2011-11-08 Thread Dan Bolser
Great work! I really like this feature. It has been requested many
times, so it's really cool that its been done. Thanks for doing this.

We briefly discussed how this could be implemented a while back, and,
although the proposal back then may push the balance of implementation
in the direction of insanity..., I'd like to mention that proposal
again here for comparison.

One way to do this would be to automatically give each property some
special properties, such that the property itself could be queried for
its set of unique values, and the number of times each value has been
used.

This is immediately tricky, because you need to link each value with
its occurrence somehow (perhaps using sub-objects), but the advantage
is that:

1) You don't have to modify results printers, you just pass them the
results of the property query.
2) It allows you to query on the number of uses, for example, querying
for all property-values that have more than 10 uses, or all values of
City that have exactly 5 locations, etc.
3) It opens the way for having more meaningful property pages,
automatically having unique values linked to searches, which I think
is what people tend to expect.


As you mentioned previously, this could be difficult to implement, but
I prefer more general approaches than increasingly specific approaches
to getting at the data... although I suppose the most general solution
of all would be to implement aggregation queries.

Thanks again for this great work that I'll definitely use heavily. I
only mention the above as a point for discussion, and I expect you'll
take it in that light (coming from a dumb user with no implementation
experience).


Cheers,
Dan.

On 8 November 2011 14:08, Jeroen De Dauw jeroended...@gmail.com wrote:
 Hey all,

 I have implemented general support for value distributions in result formats
 in SMW. This email explains this feature and is meant to gather feedback on
 it before SMW 1.7 is released.

 == Goal ==

 Allow visualizing how many times each value in a result occurs, ie allow for
 creating value distributions.

 For example, this result set: foo bar baz foo bar bar ohi
 Will be turned into
 * bar (3)
 * foo (2)
 * baz (1)
 * ohi (1)

 This can then be displayed in chart formats, with the value as label and the
 occurrence count as value. Although the most obvious use for this are
 charts, it can really be used with any format.

 == Current implementation: how to use it ==

 Each format needs to add support for this functionality before you'll be
 able to use it to visualize value distributions. Right now only jqplotbar
 and jqplotpie make use of it. All formats that support this functionality
 accept 3 additional parameters:

 * distribution (on/off) - if a value distribution should be calculated and
 shown instead of the regular results.
 * distributionsort (asc/desc/none) - the sort of the values, by occurance
 count.
 * distributionlimit (positive whole number) - the max amount of values to
 visualize.

 This example will get the countries the matching cities are located in,
 count the occurance of each, and display this as a pie chart. Note the use
 of the mainlabel parameter. If this is not done, the cities themselves will
 also be put into the value distribution.

 {{#ask: [[Category:Locations]] [[Has location type::City]]
 | ?Located in
 | format=jqplotpie
 | distribution=on
 | mainlabel=-
 | limit=500
 }}

 This example will do the same query,  but will only show the 10 countries
 with most matching cities, in descending order.

 {{#ask: [[Category:Locations]] [[Has location type::City]]
 | ?Located in
 | format=jqplotpie
 | distribution=on
 | distributionsort=desc
 | distributionlimit=10
 | mainlabel=-
 | limit=500
 }}

 You can see these examples and 2 others working on the mapping documentation
 wiki, making use of the example semantic data there:
 http://mapping.referata.com/wiki/Value_distribution_examples

 == Implementation details (technical) ==

 After looking into several options I decided to implement this as a result
 printer class deriving from SMWResultPrinter, requiring changes to each
 format that wants to support this behaviour, but making this relatively
 easy. This approach seems like a good balance between making this
 functionality available as easy as possible and staying sane.

 This class is called SMWDistributablePrinter and can be found here:
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticMediaWiki/includes/queryprinters/SMW_QP_Distributable.php?view=markup

 Example jqplotpie implementation:
 http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/SemanticResultFormats/jqPlot/SRF_jqPlotPie.php?view=markup

 == Request for comments ==

 Feedback is welcome. The main question for users is what names the
 parameters should use. Right now they all start with distribution, but
 there might be a better (and shorter) name. From developers I'd like to know
 if you agree with this architecture.

 Cheers

 --
 Jeroen De