GoranSMilovanovic has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/386323 )
Change subject: Semantics - t-SNE Maps ...................................................................... Semantics - t-SNE Maps Change-Id: I9d3c5ec541ceb4fd7041dfed75907b5de8398843 --- M WDCM_OverviewDashboard/ui.R M WDCM_SemanticsDashboard/server.R M WDCM_SemanticsDashboard/ui.R M WDCM_ShinyServerFrontPage/wdcm_ShinyFront.html M WDCM_TechDocumentation/WikidataConcepts_TechDocumentation.odt M WDCM_UsageDashboard/server.R M WDCM_UsageDashboard/ui.R 7 files changed, 277 insertions(+), 182 deletions(-) Approvals: GoranSMilovanovic: Verified; Looks good to me, approved diff --git a/WDCM_OverviewDashboard/ui.R b/WDCM_OverviewDashboard/ui.R index 25b0bdb..7ae997d 100644 --- a/WDCM_OverviewDashboard/ui.R +++ b/WDCM_OverviewDashboard/ui.R @@ -4,7 +4,7 @@ ### --------------------------------------------------------------------------- ### --- Setup -rm(list=ls()) +rm(list = ls()) ### --- general library(shiny) library(shinydashboard) @@ -238,7 +238,9 @@ <br> <p><font size = 2><b>N.B.</b> The current <b>Wikidata item usage statistic</b> definition is <i>the count of the number of pages in a particular client project where the respective Wikidata item is used</i>. Thus, the current definition ignores the usage aspects completely. This definition is motivated by the currently - present constraints in Wikidata usage tracking across the client projects. With more mature Wikidata usage tracking systems, the definition will become a subject + present constraints in Wikidata usage tracking across the client projects + (see <a href = "https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage" target = "_blank">Wikibase/Schema/wbc entity usage</a>). + With more mature Wikidata usage tracking systems, the definition will become a subject of change. The term <b>Wikidata usage volume</b> is reserved for total Wikidata usage (i.e. the sum of usage statistics) in a particular client project, group of client projects, or semantic categories. By a <b>Wikidata semantic category</b> we mean a selection of Wikidata items that is that is operationally defined by a respective SPARQL query returning a selection of items that intuitivelly match a human, natural semantic category. @@ -247,7 +249,10 @@ categories in WDCM is not necessarily exhaustive (i.e. they do not necessarily cover all Wikidata items), neither the categories are necessarily mutually exclusive. The Wikidata ontology is very complex and a product of work of many people, so there is an optimization price to be paid in every attempt to adapt or simplify its present structure to the needs of a statistical analytical system such as WDCM. The current set of WDCM semantic categories is thus not - normative in any sense and a subject of change in any moment, depending upon the analytical needs of the community.</font></p> + normative in any sense and can become a subject of change in any moment, depending upon the analytical needs of the community.</font></p> + <p><font size = 2>The currently used <b>WDCM Taxonomy</b> of Wikidata items encompasses the following 14 semantic categories: <i>Geographical Object</i>, <i>Organization</i>, <i>Architectural Structure</i>, + <i>Human</i>, <i>Wikimedia</i>, <i>Work of Art</i>, <i>Book</i>, <i>Gene</i>, <i>Scientific Article</i>, <i>Chemical Entities</i>, <i>Astronomical Object</i>, <i>Thoroughfare</i>, <i>Event</i>, + and <i>Taxon</i>.</font></p> <hr> <h4>Wikidata Usage Overview</h4> <br> @@ -258,10 +263,10 @@ of the client project pairwise Euclidean distances derived from the Projects x Categories contingency table. Given that the original higher-dimensional space from which the 2D map is derived is rather constrained by the choice of a small number of semantic categories, the similarity mapping is somewhat imprecise and should be taken as an attempt at an approximate big picture of the client projects similarity structure only. More precise 2D maps of - the similarity structures in client projects are found on the WDCM Semantics Dashboard, where each semantic category first receives an + the similarity structures in client projects are found on the <a href = "http://wdcm.wmflabs.org/WDCM_SemanticsDashboard/" target = "_blank">WDCM Semantics Dashboard</a>, where each semantic category first receives an <a href = "https://en.wikipedia.org/wiki/Topic_model" target = "_blank">LDA Topic Model</a>, and the similarity structure between the client projects is then derived from project topical distributions.<br> - While the <i>Explore</i> tab presents a dynamic <a href = "http://hafen.github.io/rbokeh/" target="_blank">{Rbokeh}</a> visualization alongised + While the <i>Explore</i> tab presents a dynamic <a href = "http://hafen.github.io/rbokeh/" target="_blank">{Rbokeh}</a> visualization alongside the tools to explore it in detail, the <i>Highlights</i> tab shows a static <a href = "http://ggplot2.org/" target="_blank">{ggplot2}</a> plot with the most important client projects marked (<b>NOTE.</b> Only top five projects (of each project type) in respect to Wikidata usage volume are labeled).</font></p> <hr> @@ -271,7 +276,7 @@ category. The size of the bubble reflects the volume of Wikidata usage from the respective category. If two categories are found in proximity, that means that the projects that tend to use the one also tend to use the another, and vice versa. Similarly to the Usage Overview, the 2D mapping is obtained by performing a <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding" target="_blank">t-SNE dimensionality reduction</a> - of the categories pairwise Euclidean distances derived from the Projects x Categories contingency table. </font></p> + of the pairwise category Euclidean distances derived from the Projects x Categories contingency table. </font></p> <hr> <h4>Wikidata Usage Distribution</h4> <br> @@ -315,23 +320,25 @@ fluidRow( column(width = 8, HTML('<h2>WDCM Navigate</h2> - <h4>Your orientation in the WDCM Dashboards System<h4> - <hr> - <ul> - <li><b>WDCM Overview</b> (current dashboard).<br> - <font size = "2">The big picture. Fundamental insights in how Wikidata is used across the client projects.</font></li><br> - <li><b>WDCM Semantics.</b><br> - <font size = "2">Detailed insights into the WDCM ontology (a selection of semantic categories from Wikidata), its distributional - semantics, and the way it is used across the client projects. If you are looking for Topic Models, yes that’s where - they live in WDCM.</font></li><br> - <li><b>WDCM Usage.</b><br> - <font size = "2">Fine-grained information on Wikidata usage across client projects and project types. Cross-tabulations and similar..</font></li><br> - <li><b>WDCM Items</b><br> - <font size = "2">Fine-grained information on particular Wikidata item usage across the client projects..</font></li><br> - <li><b>WDCM System Technical Documentation.</b><br> - <font size = "2">A document that will come to existence eventually. There are rumours of an existing draft.</font></li> - </ul>' - ) + <h4>Your orientation in the WDCM Dashboards System<h4> + <hr> + <ul> + <li><b><a href = "http://wdcm.wmflabs.org/">WDCM Portal</a></b>.<br> + <font size = "2">The entry point to WDCM Dashboards.</font></li><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_OverviewDashboard/">WDCM Overview</a> (current dashboard)</b><br> + <font size = "2">The big picture. Fundamental insights in how Wikidata is used across the client projects.</font></li><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_SemanticsDashboard/">WDCM Semantics</a></b><br> + <font size = "2">Detailed insights into the WDCM Taxonomy (a selection of semantic categories from Wikidata), its distributional + semantics, and the way it is used across the client projects. If you are looking for Topic Models - that’s where + they live.</font></li><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_UsageDashboard/">WDCM Usage</a></b><br> + <font size = "2">Fine-grained information on Wikidata usage across client projects and project types. Cross-tabulations and similar..</font></li><br> + <li><b>WDCM Items</b><br> + <font size = "2">Fine-grained information on particular Wikidata item usage across the client projects.<b> (Under development)</b></font></li><br> + <li><b>WDCM System Technical Documentation</b><br> + <font size = "2">A document that will come to existence eventually. There are rumours of an existing draft.</font></li> + </ul>' + ) ) ) ) # - tabPanel Structure END @@ -346,7 +353,7 @@ column(width = 12, hr(), HTML('<b>Wikidata Concepts Monitor :: WMDE 2017</b><br>Diffusion: <a href="https://phabricator.wikimedia.org/diffusion/AWCM/" target = "_blank">WDCM</a><br>'), - HTML('Contact: Goran S. Milovanovic, Data Analyst, WMDE<br>e-mail: goran.milovanovic_...@wikimedia.de + HTML('Contact: Goran S. Milovanovic, Data Scientist, WMDE<br>e-mail: goran.milovanovic_...@wikimedia.de <br>IRC: goransm'), br(), br() diff --git a/WDCM_SemanticsDashboard/server.R b/WDCM_SemanticsDashboard/server.R index f07ae5b..203de39 100644 --- a/WDCM_SemanticsDashboard/server.R +++ b/WDCM_SemanticsDashboard/server.R @@ -448,6 +448,55 @@ }, ignoreNULL = FALSE) + ### ------------------------------------------ + ### --- TAB: tabPanel Similarity + ### ------------------------------------------ + + ### --- SELECT: update select 'selectCategory2' + updateSelectizeInput(session, + 'selectCategory2', + "Select Semantic Category:", + choices = categories, + selected = categories[round(runif(1, 1, length(categories)))], + server = TRUE) + + ### --- OBSERVE: input$selectCategory2 + observeEvent(input$selectCategory2, { + + if (!is.null(input$selectCategory2)) { + + wdcmP <- wdcmProject %>% + select(Project, Usage) + + projCatFrame <- wdcm2_projects_2dmaps %>% + filter(Category %in% input$selectCategory2) %>% + left_join(wdcmP, by = 'Project') + + ### --- output$overviewPlotDynamic + output$overviewPlotDynamic <- renderRbokeh({ + outFig <- figure(width = 1400, height = 900, logo = NULL) %>% + ly_points(D1, D2, + data = projCatFrame, + size = log(Usage), + color = 'Project Type', + hover = list(Project, Usage)) %>% + x_axis(visible = F) %>% + y_axis(visible = F) %>% + theme_grid(which = c("x", "y"), + grid_line_color = "white") %>% + theme_plot(outline_line_alpha = 0) %>% + set_palette(discrete_color = pal_color(unname(projectTypeColor))) + outFig + }) %>% withProgress(message = 'Generating plot', + min = 0, + max = 1, + value = 1, {incProgress(amount = 1)}) + + } else {return(NULL)} + + }, ignoreNULL = FALSE) + + }) ### --- END shinyServer diff --git a/WDCM_SemanticsDashboard/ui.R b/WDCM_SemanticsDashboard/ui.R index e9df674..7eb4aaf 100644 --- a/WDCM_SemanticsDashboard/ui.R +++ b/WDCM_SemanticsDashboard/ui.R @@ -147,7 +147,7 @@ ) ), # - tabPanel Semantic Models END - tabPanel(title = "Projects", + tabPanel(title = "Project Semantics", id = "projects", fluidRow( column(width = 12, @@ -206,7 +206,50 @@ height = "1000px")) ) ) - ) # - tabPanel Projects END + ), # - tabPanel Projects END + tabPanel(title = "Similarity Maps", + id = "similarity", + fluidRow( + column(width = 12, + fluidRow( + column(width = 6, + br(), + HTML('<font size = 2>Select a semantic category of Wikidata items to take a look at. A 2D map will be generated where each + project is represented by a bubble, and where the distance between the projects corresponds with the similarity in their + usage of Wikidata items <i>from the selected category</i>. Think about semantic categories as perspectives from which you + can take a look at the structure of similarity that holds among the Wikimedia projects in respect to their usage of Wikidata items.</font>'), + br(), br() + ) + ) + ) + ), + # - fluidRow: Category Selection + fluidRow( + br(), + column(width = 3, + selectizeInput("selectCategory2", + "Select Category:", + multiple = F, + choices = NULL, + selected = NULL) + ) + ), + fluidRow( + hr(), + column(width = 12, + h4('Similarity Map'), + HTML('<font size = 2>Each bubble represents a client project. + The size of the bubble reflects the volume of Wikidata usage in the respective project; a logarithmic scale is used in this plot.<br> + Projects similar in respect to their usage of Wikidata items <i>from the selected category</i> are grouped together. + Use the tools next to the plot legend to explore the plot and hover over bubbles for details.</font>'), + hr(), + withSpinner(rbokeh::rbokehOutput('overviewPlotDynamic', + width = "1400px", + height = "900px") + ) + ) + ) + ) ) # - tabBox: Dashboard END ) @@ -224,18 +267,21 @@ <h4>Introduction<h4> <br> <p><font size = 2>This Dashboard is a part of the <b>Wikidata Concepts Monitor (WDMC)</b>. The WDCM system provides analytics on Wikidata usage - across the client projects. The WDCM Overview Dashboard presents the big picture of Wikidata usage; other WDCM dashboards go - into more detail. The Overview Dashboard provides insights into <b>(1)</b> the similarities between the client projects in respect to their use of - of Wikidata, as well as <b>(2)</b> the volume of Wikidata usage in every client project, <b>(3)</b> Wikidata usage tendencies, described by the volume of - Wikidata usage in each of the semantic categories of items that are encompassed by the current WDCM edition, <b>(4)</b> the similarities between the - Wikidata semantic categories of items in respect to their usage across the client projects, <b>(5)</b> ranking of client projects in respect to their - Wikidata usage volume, <b>(6)</b> the Wikidata usage breakdown across the types of client projects and Wikidata semantic categories.</font></p> + across the Wikimedia sister projects. The WDCM Semantics Dashboard is probably the central and the analytically most complicated of all WDCM Dashboards. + Here we provide only the necessary basics of distributional semantics needed in order to understand the results of semantic topic modeling presented on this + WDCM dashboard. A user who needs to dive deep into the similarity structures between the Wikimedia sister projects in respect to their Wikidata usage patterns + will most probably have to do some additional reading first. However, the Dashboard simplifies the presentation of the results as much as possible to make them + accessible to any Wikidata user or Wikipedia editor who is not necessarily involved in Data or Cognitive Science. Reading through the <b>WDCM Semantic Topic Models</b> + section in this page is <i>highly advised</i> to anyone who has never met semantic topic models or distributional semantics before. Before that, our next stop: Definitions. + </font></p> <hr> <h4>Definitions</h4> <br> <p><font size = 2><b>N.B.</b> The current <b>Wikidata item usage statistic</b> definition is <i>the count of the number of pages in a particular client project where the respective Wikidata item is used</i>. Thus, the current definition ignores the usage aspects completely. This definition is motivated by the currently - present constraints in Wikidata usage tracking across the client projects. With more mature Wikidata usage tracking systems, the definition will become a subject + present constraints in Wikidata usage tracking across the client projects + (see <a href = "https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage" target = "_blank">Wikibase/Schema/wbc entity usage</a>). + With more mature Wikidata usage tracking systems, the definition will become a subject of change. The term <b>Wikidata usage volume</b> is reserved for total Wikidata usage (i.e. the sum of usage statistics) in a particular client project, group of client projects, or semantic categories. By a <b>Wikidata semantic category</b> we mean a selection of Wikidata items that is that is operationally defined by a respective SPARQL query returning a selection of items that intuitivelly match a human, natural semantic category. @@ -245,63 +291,59 @@ mutually exclusive. The Wikidata ontology is very complex and a product of work of many people, so there is an optimization price to be paid in every attempt to adapt or simplify its present structure to the needs of a statistical analytical system such as WDCM. The current set of WDCM semantic categories is thus not normative in any sense and a subject of change in any moment, depending upon the analytical needs of the community.</font></p> + <p>The currently used <b>WDCM Taxonomy</b> of Wikidata items encompasses the following 14 semantic categories: <i>Geographical Object</i>, <i>Organization</i>, <i>Architectural Structure</i>, + <i>Human</i>, <i>Wikimedia</i>, <i>Work of Art</i>, <i>Book</i>, <i>Gene</i>, <i>Scientific Article</i>, <i>Chemical Entities</i>, <i>Astronomical Object</i>, <i>Thoroughfare</i>, <i>Event</i>, + and <i>Taxon</i>.</p> <hr> - <h4>Wikidata Usage Overview</h4> + <h4>WDCM Semantic Topic Models</h4> <br> - <p><font size = 2>The similarity structure in Wikidata usage <i>across the client projects</i> is presented. Each bubble represents a client project. - The size of the bubble reflects the volume of Wikidata usage in the respective project. Projects similar in respect to the semantics of Wikidata - usage are grouped together.<br> - The bubble chart is produced by performing a <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding" target="_blank">t-SNE dimensionality reduction</a> - of the client project pairwise Euclidean distances derived from the Projects x Categories contingency table. Given that the original higher-dimensional space - from which the 2D map is derived is rather constrained by the choice of a small number of semantic categories, the similarity mapping is somewhat - imprecise and should be taken as an attempt at an approximate big picture of the client projects similarity structure only. More precise 2D maps of - the similarity structures in client projects are found on the WDCM Semantics Dashboard, where each semantic category first receives an - <a href = "https://en.wikipedia.org/wiki/Topic_model" target = "_blank">LDA Topic Model</a>, - and the similarity structure between the client projects is then derived from project topical distributions.<br> - While the <i>Explore</i> tab presents a dynamic <a href = "http://hafen.github.io/rbokeh/" target="_blank">{Rbokeh}</a> visualization alongised - the tools to explore it in detail, the <i>Highlights</i> tab shows a static <a href = "http://ggplot2.org/" target="_blank">{ggplot2}</a> plot with the most important client projects - marked (<b>NOTE.</b> Only top five projects (of each project type) in respect to Wikidata usage volume are labeled).</font></p> + <h5>Suggested Readings</h5> + <ul> + <li><b>Distributional Semantics.</b> In <i>Wikipedia</i>. Retrieved October 24, 2017, from <a href = "https://en.wikipedia.org/wiki/Distributional_semantics" target = "_blank">https://en.wikipedia.org/wiki/Distributional_semantics</a></li> + <li><b>Topic model.</b> In <i>Wikipedia</i>. Retrieved October 24, 2017, from <a href = "https://en.wikipedia.org/wiki/Topic_model" target = "_blank">https://en.wikipedia.org/wiki/Topic_model</a></li> + <li><b>Latent Dirichlet allocation.</b> In <i>Wikipedia</i>. Retrieved October 24, 2017, from <a href = "https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation" target = "_blank">https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation</a></li> + <li><b>Dimensionality reduction.</b> In <i>Wikipedia</i>. Retrieved October 24, 2017, from <a href = "https://en.wikipedia.org/wiki/Dimensionality_reduction" target = "_blank">https://en.wikipedia.org/wiki/Dimensionality_reduction</a></li> + </ul> + <p><font size = 2>While <a href = "https://www.wikidata.org/wiki/Wikidata:Main_Page" target = "_blank">Wikidata</a> itself is a <a href = "https://en.wikipedia.org/wiki/Ontology_(information_science)" + target = "_blank">semantic ontology</a> with pre-defined and evolving normative rules of description and inference, <b>Wikidata usage</b> is essentialy a social, behavioral phenomenon, + suitable for study by means of <a href = "https://en.wikipedia.org/wiki/Machine_learning" target = "_blank">machine learning</a> in the field of <a href = "https://en.wikipedia.org/wiki/Distributional_semantics" + target = "_blank">distributional semantics</a>: the analysis and modeling of statistical patterns of occurrence and co-occurence of Wikidata item and property + usage across the client projects (e.g. <i>enwiki</i>, <i>frwiki</i>, <i>ruwiki</i>, etc). WDCM thus employes various statistical approaches in an attempt to describe and provide insights from the observable Wikidata + usage statistics (e.g. <a href = "https://en.wikipedia.org/wiki/Topic_model" target = "_blank">topic modeling</a>, <a href = "https://en.wikipedia.org/wiki/Cluster_analysis" target = "_blank">clustering</a>, + <a href = "https://en.wikipedia.org/wiki/Dimensionality_reduction" target = "_blank">dimensionality reduction</a>, all beyond providing elementary descriptive statistics of Wikidata usage, of course). + <br><br> + <b><i>Wikidata Usage Patterns.</b></i> The <i>“golden line”</i> that connects the reasoning behind all WDCM functions can be non-technically described in the following way. Imagine observing the number of times a set of + size <b>N</b> of particular Wikidata items was used across some project (<i>enwiki</i>, for example). Example having the same data or other projects as well: for example, if 200 projects are under analysis, then we + have <b>200</b> counts for <b>N</b> items in a set, and the data can be desribed by a <b>N x 200</b> matrix (<i>items</i> x <i>projects</i>). Each column of counts, representing the frequency of occurence of all Wikidata + entities under consideration across one of the 200 projects under discussion - a vector, obviously - represents a particular <i>Wikidata usage pattern</i>. By inspecting and modeling statistically the usage pattern matrix - + a matrix that encompasses all such usage patterns across the projects, or the derived covariance/correlation matrix - many insigths into the similarities between Wikimedia projects items projects (or, more precisely, + the similarities between their usage patterns) can be found. + <br>In essence, the technology and mathematics behind WDCM relies on the same set of practical tools and ideas that support the development of <a href = "https://en.wikipedia.org/wiki/Semantic_search" target = "_blank">semantic search engines</a> + and <a href = "https://en.wikipedia.org/wiki/Recommender_system" target = "_blank">recommendation systems</a>, only applied to a specific dataset that encompasses the usage patterns for tens of millions of Wikidata entities across its client projects.</font></p> <hr> - <h4>Wikidata Usage Tendency</h4> + <h4>Dashboard: Semantic Models</h4> <br> - <p><font size = 2>The similarity structure in Wikidata usage <i>across the semantic categories</i> is presented. Each bubble represents a Wikidata semantic - category. The size of the bubble reflects the volume of Wikidata usage from the respective category. If two categories are found in proximity, - that means that the projects that tend to use the one also tend to use the another, and vice versa. Similarly to the Usage Overview, the 2D mapping is obtained by performing - a <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding" target="_blank">t-SNE dimensionality reduction</a> - of the categories pairwise Euclidean distances derived from the Projects x Categories contingency table. </font></p> + <p><font size = 2>Each of the 14 currently used semantic categories in the WDCM Taxonomy of Wikidata items receives a separate topic model. Each topic model encompasses two or more + topics, or semantic themes. Here you can select a semantic category (e.g. "Geographical Object", "Human") and a particular topic from its model. The page will produce three outputs: + (1) the <i>Top 50 items in this topic</i> chart, which presents the 50 most important items in the select topic of the selected category\'s topic model, (2) the <i>Topic similarity network</i>, + which presents the similarity structure among the 50 most important items in the selected topic, and (c) the <i>Top 50 projects in this topic</i> chart, where 50 Wikimedia projects in which the + selected topic plays a prominent role in the selected semantic category. + </font></p> <hr> - <h4>Wikidata Usage Distribution</h4> + <h4>Dashboard: Project Semantics</h4> <br> - <p><font size = 2>The plots are helpful to build an understanding of the relative range of Wikidata usage across the client projects. - In the <i>Project Usage Rank-Frequency</i> plot, each point represents a client project; Wikidata usage is represented on the vertical and - the project usage rank on the horizontal axis, while only top project (per project type) are labeled. The highly-skewed, asymmetrical - distribution reveals that a small fraction of client projects only accounts for a huge proportion of Wikidata usage.<br> In the - <i>Project Usage log(Rank)-log(Frequency)</i> plot, the logarithms of both variables are represented. - A <a href = "https://en.wikipedia.org/wiki/Power_law" target="_blank">power-law</a> relationship holds true if this - plot is linear. The plot includes the best linear fit, however, no attempts to estimate the underlying probability distribution were made. </font></p> + <p><font size = 2>Make a selection of Wikimedia projects here and hit <i>Apply Selection</i>. The Dashboard will produce a series of charts, one per each Wikidata semantic category that is + present in your selection of projects, and compute the relative importance (%) of each topic in the given selection and for each semantic category. Do not forget that category specific + semantic models do not necessarily encompass the same number of topics (in fact, they rarely do); also, <i>Topic n</i> in one category is obviously not the same thing as <i>Topic n</i> in + some other category. + </font></p> <hr> - <h4>Client Project Types</h4> + <h4>Dashboard: Similarity Maps</h4> <br> - <p><font size = 2>Project types are provided in the rows of this chart, while the semantic categories are given on the horizontal axis. - The height of the respective bar indicates Wikidata usage volume from the respective semantic category in a particular client project.</font></p> - <hr> - <h4>Client Projects Usage Volume</h4> - <br> - <p><font size = 2>Use the slider to select the percentile rank range of the Wikidata usage volume distribution across the client project to show. The - chart will automatically adjust to present the selected projects in increasing order of Wikidata usage, and presenting at most 30 top projects - from the selection. <b>NOTE.</b> The <a href="https://en.wikipedia.org/wiki/Percentile_rank" target="_blank">percentile rank</a> - of a score is the percentage of scores in its frequency distribution that are equal to or lower than it. - For example, a client project that has a Wikidata usage volume greater than or equal to 75% of all client projects under - consideration is said to be at the 75th percentile, where 75 is the percentile rank.<br> In effect, you can browse the whole - distribution of Wikidata usage across the client projects by selecting the lower and uppers limit in terms of usage percentile rank.</font></p> - <hr> - <h4>Wikidata Usage Browser</h4> - <br> - <p><font size = 2>A breakdown of Wikidata usage statistics across client projects and semantic categories. To the left, - a table that presents a Client Project vs. Semantic Category cross-tabulation. The Usage column in this table is the Wikidata - usage statistic for a particular Semantic Category x Client Project combination (e.g. The Wikidata usage in the category "Human" in - the dewiki project). To the right, the total Wikidata usage per client project is presented (i.e. the sum of Wikidata usage across - all semantic categories for a particular client project; e.g. the total Wikidata usage volume of enwiki).</font></p> + <p><font size = 2>Upon a selection of semantic category, the Dashboard will present a 2D map which represents the similarities between the Wikimedia projects computed from the selected category\'s + semantic model only. Here you can learn how similar or dissimilar are the sister projects in respect to their usage Wikidata items from a single semantic category. + </font></p> + ') ) ) @@ -315,17 +357,19 @@ <h4>Your orientation in the WDCM Dashboards System<h4> <hr> <ul> - <li><b>WDCM Overview</b> (current dashboard).<br> + <li><b><a href = "http://wdcm.wmflabs.org/">WDCM Portal</a></b>.<br> + <font size = "2">The entry point to WDCM Dashboards.</font></li><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_OverviewDashboard/">WDCM Overview</a></b><br> <font size = "2">The big picture. Fundamental insights in how Wikidata is used across the client projects.</font></li><br> - <li><b>WDCM Semantics.</b><br> - <font size = "2">Detailed insights into the WDCM ontology (a selection of semantic categories from Wikidata), its distributional - semantics, and the way it is used across the client projects. If you are looking for Topic Models, yes that’s where - they live in WDCM.</font></li><br> - <li><b>WDCM Usage.</b><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_SemanticsDashboard/">WDCM Semantics</a> (current dashboard)</b><br> + <font size = "2">Detailed insights into the WDCM Taxonomy (a selection of semantic categories from Wikidata), its distributional + semantics, and the way it is used across the client projects. If you are looking for Topic Models - that’s where + they live.</font></li><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_UsageDashboard/">WDCM Usage</a></b><br> <font size = "2">Fine-grained information on Wikidata usage across client projects and project types. Cross-tabulations and similar..</font></li><br> <li><b>WDCM Items</b><br> - <font size = "2">Fine-grained information on particular Wikidata item usage across the client projects..</font></li><br> - <li><b>WDCM System Technical Documentation.</b><br> + <font size = "2">Fine-grained information on particular Wikidata item usage across the client projects.<b> (Under development)</b></font></li><br> + <li><b>WDCM System Technical Documentation</b><br> <font size = "2">A document that will come to existence eventually. There are rumours of an existing draft.</font></li> </ul>' ) @@ -344,7 +388,7 @@ column(width = 12, hr(), HTML('<b>Wikidata Concepts Monitor :: WMDE 2017</b><br>Diffusion: <a href="https://phabricator.wikimedia.org/diffusion/AWCM/" target = "_blank">WDCM</a><br>'), - HTML('Contact: Goran S. Milovanovic, Data Analyst, WMDE<br>e-mail: goran.milovanovic_...@wikimedia.de + HTML('Contact: Goran S. Milovanovic, Data Scientist, WMDE<br>e-mail: goran.milovanovic_...@wikimedia.de <br>IRC: goransm'), br(), br() diff --git a/WDCM_ShinyServerFrontPage/wdcm_ShinyFront.html b/WDCM_ShinyServerFrontPage/wdcm_ShinyFront.html index fa93e8b..2e1387b 100644 --- a/WDCM_ShinyServerFrontPage/wdcm_ShinyFront.html +++ b/WDCM_ShinyServerFrontPage/wdcm_ShinyFront.html @@ -147,7 +147,7 @@ you probably need to get to learn about some important WDCM definitions (and the constraints that dictated them) first. You can do that by reading through the Definitions section of the WDCM Wikitech Technical -Documentation <span style="font-weight: bold;">[[LINK HERE!!!]]</span>. +Documentation <span style="font-weight: bold;"></span>. Do not panic, please: it is written in a language that a non-technical person who does not necessarily care about <a href="https://en.wikipedia.org/wiki/Data_science" target="_blank">Data @@ -177,22 +177,25 @@ style="color: rgb(34, 34, 34); font-family: "Helvetica Neue",Helvetica,"Lucida Grande",Tahoma,Verdana,sans-serif; font-size: 25.2px; font-style: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; background-color: rgb(255, 255, 255); display: inline ! important; float: none;"></span>Adam Shorland, Software Developer, Wikimedia Foundation Deutschland</a>. <a href="https://www.wikidata.org/wiki/Q18016466" target="_blank">Lydia -Pintcher, Product Manager of Wikidata, Wikimedia Foundation Deutschland</a>, +Pintcher, Product Manager of Wikidata, Wikimedia Deutschland</a>, supervised the development of the system and contributed the currently -used WDCM Semantic Taxonomy <span style="font-weight: bold;">[[LINK -HERE!!!]]</span> that the system relies on. The software development of +used WDCM Semantic Taxonomy <span style="font-weight: bold;"></span> that the system relies on. The software development of the WDCM system is supervised by <a href="https://www.wikidata.org/wiki/User:Tobias_Gritschacher_%28WMDE%29" target="_blank">Tobias Gritschacher, Engineering Manager, Wikimedia Foundation Deutschland</a>, while <a href="https://www.mediawiki.org/wiki/User:Jan_Dittrich_%28WMDE%29" target="_blank">Jan Dittrich, UX Design / Research, Wikimedia -Foundation Deutschland</a> supervises the UI/UX aspects.</small><br> +Foundation Deutschland</a> supervises the UI/UX aspects.The write-ups of the + previous experiences in managing Shiny Dashboards on behalf of + <a href = "https://wikimediafoundation.org/wiki/User:MPopov_(WMF)" target = "_blank">Mikhail Popov</a> and the team that built our + <a href = "https://discovery.wmflabs.org/" target = "_blank">Discovery Dashboards</a> + were very helpful in the development of the WDCM Dashboards.</small><br> <br> -<h2>3. How it works?<br> +<h2>3. How does it work?<br> </h2> <small>The WDCM Wikitech Technical -Documentation <span style="font-weight: bold;">[[LINK HERE!!!]]</span> +Documentation <span style="font-weight: bold;"></span> should be providing enough information in respect to how WDCM works. To put it in a nutshell, the current version of the WDCM system is fully developed in <a href="https://www.r-project.org/" target="_blank">R</a>, @@ -235,23 +238,23 @@ <div id="shiny"> <h2>WDCM Dashboards</h2> - <hr> + <a href = "http://wdcm.wmflabs.org/WDCM_OverviewDashboard/"><h4>WDCM Overview</h4></a> <a href = "http://wdcm.wmflabs.org/WDCM_OverviewDashboard/"><img src="OverviewDashboard.png" alt="WDCM Overview" style="width:300px;"></a> <br> <div class="caption"> The Overview Dashboard provided an introductory overview - the "big picture" of Wikidata usage. </div> - <hr> + <br> <a href = "http://wdcm.wmflabs.org/WDCM_UsageDashboard/"><h4>WDCM Usage</h4></a> <a href = "http://wdcm.wmflabs.org/WDCM_UsageDashboard/"><img src="UsageDashboard.png" alt="WDCM Usage" style="width:300px;"></a> <br> <div class="caption"> The Usage Dashboard provides a thorough insight into Wikidata usage across the sister projects and semantic categories. </div> - <hr> + <br> <a href = "http://wdcm.wmflabs.org/WDCM_SemanticsDashboard/"><h4>WDCM Semantics</h4></a> <a href = "http://wdcm.wmflabs.org/WDCM_SemanticsDashboard/"><img src="SemanticsDashboard.png" alt="WDCM Semantics" style="width:300px;"></a> <br> @@ -259,8 +262,8 @@ The Semantics Dashboard provides an insight into the distributional semantics of Wikidata usage. </div> - <hr> + <br> <a href = "https://www.wikidata.org/wiki/Wikidata:Main_Page" target = "blank"><img src="Wikidata-logo-en.png" style="width:300px;"></a> </div> diff --git a/WDCM_TechDocumentation/WikidataConcepts_TechDocumentation.odt b/WDCM_TechDocumentation/WikidataConcepts_TechDocumentation.odt index f3c6168..eb011f0 100644 --- a/WDCM_TechDocumentation/WikidataConcepts_TechDocumentation.odt +++ b/WDCM_TechDocumentation/WikidataConcepts_TechDocumentation.odt Binary files differ diff --git a/WDCM_UsageDashboard/server.R b/WDCM_UsageDashboard/server.R index e70a06b..5bdfdd6 100644 --- a/WDCM_UsageDashboard/server.R +++ b/WDCM_UsageDashboard/server.R @@ -547,7 +547,7 @@ ### --- SELECT: update select 'selectProject' updateSelectizeInput(session, 'selectProject', - choices = c(projects, paste("_", projectTypes, sep="")), + choices = c(projects, paste("_", projectTypes, sep = "")), selected = c("_Wikipedia", "_Wikinews", "_Wiktionary"), server = TRUE) diff --git a/WDCM_UsageDashboard/ui.R b/WDCM_UsageDashboard/ui.R index 56b7727..cc2c552 100644 --- a/WDCM_UsageDashboard/ui.R +++ b/WDCM_UsageDashboard/ui.R @@ -425,90 +425,80 @@ tabPanel("Description", fluidRow( column(width = 8, - HTML('<h2>WDCM Overview Dashboard</h2> + HTML('<h2>WDCM Usage Dashboard</h2> <h4>Description<h4> <hr> <h4>Introduction<h4> <br> <p><font size = 2>This Dashboard is a part of the <b>Wikidata Concepts Monitor (WDMC)</b>. The WDCM system provides analytics on Wikidata usage - across the client projects. The WDCM Overview Dashboard presents the big picture of Wikidata usage; other WDCM dashboards go - into more detail. The Overview Dashboard provides insights into <b>(1)</b> the similarities between the client projects in respect to their use of - of Wikidata, as well as <b>(2)</b> the volume of Wikidata usage in every client project, <b>(3)</b> Wikidata usage tendencies, described by the volume of - Wikidata usage in each of the semantic categories of items that are encompassed by the current WDCM edition, <b>(4)</b> the similarities between the - Wikidata semantic categories of items in respect to their usage across the client projects, <b>(5)</b> ranking of client projects in respect to their - Wikidata usage volume, <b>(6)</b> the Wikidata usage breakdown across the types of client projects and Wikidata semantic categories.</font></p> - <hr> + across the Wikimedia sister projects. The WDCM Usage Dashboard focuses on providing the detailed statistics on Wikidata usage in particular sister projects or + the selected subsets of them. Three pages that present analytical results in this Dashboard receive a description here: (1) <b><i>Usage</i></b>, (2) <b><i>Tabs/Crosstabs</i></b>, + and (3) <b><i>Tables</i></b>. But first, definitions.</font></p> + <hr> <h4>Definitions</h4> <br> <p><font size = 2><b>N.B.</b> The current <b>Wikidata item usage statistic</b> definition is <i>the count of the number of pages in a particular client project where the respective Wikidata item is used</i>. Thus, the current definition ignores the usage aspects completely. This definition is motivated by the currently - present constraints in Wikidata usage tracking across the client projects. With more mature Wikidata usage tracking systems, the definition will become a subject + present constraints in Wikidata usage tracking across the client projects + (see <a href = "https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage" target = "_blank">Wikibase/Schema/wbc entity usage</a>). + With more mature Wikidata usage tracking systems, the definition will become a subject of change. The term <b>Wikidata usage volume</b> is reserved for total Wikidata usage (i.e. the sum of usage statistics) in a particular client project, group of client projects, or semantic categories. By a <b>Wikidata semantic category</b> we mean a selection of Wikidata items that is - that is operationally defined by a respective SPARQL query returning a selection of items that intuitivelly match a human, natural semantic category. + that is operationally defined by a respective SPARQL query, returning a selection of items that intuitivelly match a human, natural semantic category. The structure of Wikidata does not necessarily match any intuitive human semantics. In WDCM, an effort is made to select the semantic categories so to match the intuitive, everyday semantics as much as possible, in order to assist anyone involved in analytical work with this system. However, the choice of semantic categories in WDCM is not necessarily exhaustive (i.e. they do not necessarily cover all Wikidata items), neither the categories are necessarily mutually exclusive. The Wikidata ontology is very complex and a product of work of many people, so there is an optimization price to be paid in every attempt to adapt or simplify its present structure to the needs of a statistical analytical system such as WDCM. The current set of WDCM semantic categories is thus not normative in any sense and a subject of change in any moment, depending upon the analytical needs of the community.</font></p> + <p><font size = 2>The currently used <b>WDCM Taxonomy</b> of Wikidata items encompasses the following 14 semantic categories: <i>Geographical Object</i>, <i>Organization</i>, <i>Architectural Structure</i>, + <i>Human</i>, <i>Wikimedia</i>, <i>Work of Art</i>, <i>Book</i>, <i>Gene</i>, <i>Scientific Article</i>, <i>Chemical Entities</i>, <i>Astronomical Object</i>, <i>Thoroughfare</i>, <i>Event</i>, + and <i>Taxon</i>.</font></p> <hr> - <h4>Wikidata Usage Overview</h4> + <h4>Usage</h4> <br> - <p><font size = 2>The similarity structure in Wikidata usage <i>across the client projects</i> is presented. Each bubble represents a client project. - The size of the bubble reflects the volume of Wikidata usage in the respective project. Projects similar in respect to the semantics of Wikidata - usage are grouped together.<br> - The bubble chart is produced by performing a <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding" target="_blank">t-SNE dimensionality reduction</a> - of the client project pairwise Euclidean distances derived from the Projects x Categories contingency table. Given that the original higher-dimensional space - from which the 2D map is derived is rather constrained by the choice of a small number of semantic categories, the similarity mapping is somewhat - imprecise and should be taken as an attempt at an approximate big picture of the client projects similarity structure only. More precise 2D maps of - the similarity structures in client projects are found on the WDCM Semantics Dashboard, where each semantic category first receives an - <a href = "https://en.wikipedia.org/wiki/Topic_model" target = "_blank">LDA Topic Model</a>, - and the similarity structure between the client projects is then derived from project topical distributions.<br> - While the <i>Explore</i> tab presents a dynamic <a href = "http://hafen.github.io/rbokeh/" target="_blank">{Rbokeh}</a> visualization alongised - the tools to explore it in detail, the <i>Highlights</i> tab shows a static <a href = "http://ggplot2.org/" target="_blank">{ggplot2}</a> plot with the most important client projects - marked (<b>NOTE.</b> Only top five projects (of each project type) in respect to Wikidata usage volume are labeled).</font></p> + <p><font size = 2>The Usage tab provides elementary statistics on Wikidata usage across the semantic categories (left column) and sister projects + (right column).<br> + <b><i>To the left</b></i>, we first encounter a general overview of <i>Basic Facts</i>: the number of Wikidata items that are encompassed by the current WDCM taxonomy (in effect, + this is the number of items that are encompassed by all WDCM analyses), the number of sister projects that have client-side Wikidata usage tracking enabled (currently, + that means that the <a href = "https://www.mediawiki.org/wiki/Wikibase/Schema/wbc_entity_usage" target = "_blank">Wikibase/Schema/wbc entity usage</a>) is present there), + the number of semantic categories in the current version of the WDCM Taxonomy, and the number of different sister project types (e.g. <i>Wikipedia</i>, <i>Wikinews</i>, etc). + <br> + The <b>Category Report</b> subsection allows you to select a specific semantic category and generate two charts beneath the selection: (a) the category top 30 projects chart, and + (b) the category top 30 Wikidata items chart. The first chart will display 30 sister projects that use Wikidata items from this semantic category the most, with the usage data + represented on the horizontal axis, and the project labels on the vertical axis. The percentages next to the data points in this chart refer to the proportion of total category usage + that takes place in the respective project. The next chart will display the 30 most popular items from the selected semantic category: item usage is again placed on the horizontal axis, + item labels are on the vertical axis, and item IDs are placed next to the data points themselves. + <br> + The <b>Categories General Overview</b> subsection is static and allows no selection; it introduces two concise overviews of Wikidata usage across the semantic categories of + Wikidata items. The <i>Wikidata Usage per Semantic Cateory</i> chart provides semantic categories on the vertical and item usage statistics on the horizontal axis; the percentages + tells us about the proportion of total Wikidata usage that the respective semantic category carries. Beneath, the <i>Wikidata item usage per semantic category in each project type</i> + provides a cross-tabulation of semantic categories vs. sister project types. The categories are color-coded and represented on the horizontal axes, while each chart represents one project + type. The usage scale, represented on the vertical axes, is logarithmic to ease the comparison and enable practical data visualization. + <br> + <b><i>To the right</b></i>, an opportunity to inspect Wikidata usage in a single Wikimedia project is provided. The <b>Project Report</b> section allows you to select a single Wikimedia + project and obtain results on it. The first section that will be generated upon making a selection provides a concise narrative summary of Wikidata usage in the selected project alongside + a chart presenting an overview of Wikidata usage per semantic category. The next chart, <i>Wikidata usage rank</i>, show the rank position of the selected project among other sister projects + in respect to the Wikidata usage volume. Beneath, a more complex structure, <i>Semantic Neighbourhood</i>, is given. In this network, or a directed graph if you prefere, each project points + towards the one most similar to it. The selected projects has a different color. The results are relevant only in the context of the current selection: the selected project and its 20 nearest + semantic neighboors only are presented. Once again: each project points to the one which utilizes Wikidata in a way most similar to it. The <i>top 30 Wikidata items</i> chart presents the top 30 + Wikidata items in the selected project: item labels are given on the vertical axis, Wikidata usage on the horizontal axis, and the item IDs are labeled close to the data points themselves. + </font></p> <hr> - <h4>Wikidata Usage Tendency</h4> + <h4>Tabs/Crosstabs</h4> <br> - <p><font size = 2>The similarity structure in Wikidata usage <i>across the semantic categories</i> is presented. Each bubble represents a Wikidata semantic - category. The size of the bubble reflects the volume of Wikidata usage from the respective category. If two categories are found in proximity, - that means that the projects that tend to use the one also tend to use the another, and vice versa. Similarly to the Usage Overview, the 2D mapping is obtained by performing - a <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding" target="_blank">t-SNE dimensionality reduction</a> - of the categories pairwise Euclidean distances derived from the Projects x Categories contingency table. </font></p> + <p><font size = 2> + Here we have the most direct opportunity to study the Wikidata usage statistics across the sister projects. A selection of projects and semantic categories will be intersected and only results in + the scope of the intersection will be returned. The charts should be self-explanatory: the usage statistic is always represented by the vertical axis, while the horizontal axis and sub-panels play + various roles in the context of whether a category vs project or a category vs project type crosstabulation is provided. Data points are labeled in million (M) or thousand (K) pages (see Wikidata usage) + definition above). While charts can display a limited number of data points only, relative to the size of the selection, each of them is accompanied by a <b>Data (csv)</b> button that will initiate a + download of the full respective data set as a comma separated file. + </font></p> <hr> - <h4>Wikidata Usage Distribution</h4> + <h4>Tables</h4> <br> - <p><font size = 2>The plots are helpful to build an understanding of the relative range of Wikidata usage across the client projects. - In the <i>Project Usage Rank-Frequency</i> plot, each point represents a client project; Wikidata usage is represented on the vertical and - the project usage rank on the horizontal axis, while only top project (per project type) are labeled. The highly-skewed, asymmetrical - distribution reveals that a small fraction of client projects only accounts for a huge proportion of Wikidata usage.<br> In the - <i>Project Usage log(Rank)-log(Frequency)</i> plot, the logarithms of both variables are represented. - A <a href = "https://en.wikipedia.org/wiki/Power_law" target="_blank">power-law</a> relationship holds true if this - plot is linear. The plot includes the best linear fit, however, no attempts to estimate the underlying probability distribution were made. </font></p> - <hr> - <h4>Client Project Types</h4> - <br> - <p><font size = 2>Project types are provided in the rows of this chart, while the semantic categories are given on the horizontal axis. - The height of the respective bar indicates Wikidata usage volume from the respective semantic category in a particular client project.</font></p> - <hr> - <h4>Client Projects Usage Volume</h4> - <br> - <p><font size = 2>Use the slider to select the percentile rank range of the Wikidata usage volume distribution across the client project to show. The - chart will automatically adjust to present the selected projects in increasing order of Wikidata usage, and presenting at most 30 top projects - from the selection. <b>NOTE.</b> The <a href="https://en.wikipedia.org/wiki/Percentile_rank" target="_blank">percentile rank</a> - of a score is the percentage of scores in its frequency distribution that are equal to or lower than it. - For example, a client project that has a Wikidata usage volume greater than or equal to 75% of all client projects under - consideration is said to be at the 75th percentile, where 75 is the percentile rank.<br> In effect, you can browse the whole - distribution of Wikidata usage across the client projects by selecting the lower and uppers limit in terms of usage percentile rank.</font></p> - <hr> - <h4>Wikidata Usage Browser</h4> - <br> - <p><font size = 2>A breakdown of Wikidata usage statistics across client projects and semantic categories. To the left, - a table that presents a Client Project vs. Semantic Category cross-tabulation. The Usage column in this table is the Wikidata - usage statistic for a particular Semantic Category x Client Project combination (e.g. The Wikidata usage in the category "Human" in - the dewiki project). To the right, the total Wikidata usage per client project is presented (i.e. the sum of Wikidata usage across - all semantic categories for a particular client project; e.g. the total Wikidata usage volume of enwiki).</font></p> + <p><font size = 2>The section presents searchable and sortable tables and crosstabulations with self-explanatory semantics. Access full WDCM usage datasets from here.</font></p> + ') ) ) @@ -519,23 +509,25 @@ fluidRow( column(width = 8, HTML('<h2>WDCM Navigate</h2> - <h4>Your orientation in the WDCM Dashboards System<h4> - <hr> - <ul> - <li><b>WDCM Overview</b> (current dashboard).<br> - <font size = "2">The big picture. Fundamental insights in how Wikidata is used across the client projects.</font></li><br> - <li><b>WDCM Semantics.</b><br> - <font size = "2">Detailed insights into the WDCM ontology (a selection of semantic categories from Wikidata), its distributional - semantics, and the way it is used across the client projects. If you are looking for Topic Models, yes that’s where - they live in WDCM.</font></li><br> - <li><b>WDCM Usage.</b><br> - <font size = "2">Fine-grained information on Wikidata usage across client projects and project types. Cross-tabulations and similar..</font></li><br> - <li><b>WDCM Items</b><br> - <font size = "2">Fine-grained information on particular Wikidata item usage across the client projects..</font></li><br> - <li><b>WDCM System Technical Documentation.</b><br> - <font size = "2">A document that will come to existence eventually. There are rumours of an existing draft.</font></li> - </ul>' - ) + <h4>Your orientation in the WDCM Dashboards System<h4> + <hr> + <ul> + <li><b><a href = "http://wdcm.wmflabs.org/">WDCM Portal</a></b>.<br> + <font size = "2">The entry point to WDCM Dashboards.</font></li><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_OverviewDashboard/">WDCM Overview</a></b><br> + <font size = "2">The big picture. Fundamental insights in how Wikidata is used across the client projects.</font></li><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_SemanticsDashboard/">WDCM Semantics</a></b><br> + <font size = "2">Detailed insights into the WDCM Taxonomy (a selection of semantic categories from Wikidata), its distributional + semantics, and the way it is used across the client projects. If you are looking for Topic Models - that’s where + they live.</font></li><br> + <li><b><a href = "http://wdcm.wmflabs.org/WDCM_UsageDashboard/">WDCM Usage</a> (current dashboard)</b><br> + <font size = "2">Fine-grained information on Wikidata usage across client projects and project types. Cross-tabulations and similar..</font></li><br> + <li><b>WDCM Items</b><br> + <font size = "2">Fine-grained information on particular Wikidata item usage across the client projects.<b> (Under development)</b></font></li><br> + <li><b>WDCM System Technical Documentation</b><br> + <font size = "2">A document that will come to existence eventually. There are rumours of an existing draft.</font></li> + </ul>' + ) ) ) ) # - tabPanel Structure END @@ -551,7 +543,7 @@ column(width = 12, hr(), HTML('<b>Wikidata Concepts Monitor :: WMDE 2017</b><br>Diffusion: <a href="https://phabricator.wikimedia.org/diffusion/AWCM/" target = "_blank">WDCM</a><br>'), - HTML('Contact: Goran S. Milovanovic, Data Analyst, WMDE<br>e-mail: goran.milovanovic_...@wikimedia.de + HTML('Contact: Goran S. Milovanovic, Data Scientist, WMDE<br>e-mail: goran.milovanovic_...@wikimedia.de <br>IRC: goransm'), br(), br() -- To view, visit https://gerrit.wikimedia.org/r/386323 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9d3c5ec541ceb4fd7041dfed75907b5de8398843 Gerrit-PatchSet: 1 Gerrit-Project: analytics/wmde/WDCM Gerrit-Branch: master Gerrit-Owner: GoranSMilovanovic <goran.milovanovic_...@wikimedia.de> Gerrit-Reviewer: GoranSMilovanovic <goran.milovanovic_...@wikimedia.de> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits