jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/380770 )
Change subject: Sep 26 2017 Add logo + Crosstabs ...................................................................... Sep 26 2017 Add logo + Crosstabs Change-Id: Ib912ed9dfa5060bbe888714b648be77eb6f69734 --- M WDCM_OverviewDashboard/server.R M WDCM_OverviewDashboard/ui.R A WDCM_OverviewDashboard/www/Wikidata-logo-en.png M WDCM_UsageDashboard/server.R M WDCM_UsageDashboard/ui.R A WDCM_UsageDashboard/www/Wikidata-logo-en.png 6 files changed, 330 insertions(+), 29 deletions(-) Approvals: GoranSMilovanovic: Verified; Looks good to me, approved jenkins-bot: Verified diff --git a/WDCM_OverviewDashboard/server.R b/WDCM_OverviewDashboard/server.R index be744cd..fd1d5fe 100644 --- a/WDCM_OverviewDashboard/server.R +++ b/WDCM_OverviewDashboard/server.R @@ -1,4 +1,3 @@ - ### --------------------------------------------------------------------------- ### --- WDCM Dashboard Module, v. Beta 0.1 ### --- Script: server.R, v. Beta 0.1 diff --git a/WDCM_OverviewDashboard/ui.R b/WDCM_OverviewDashboard/ui.R index 7aa6d79..39e1f07 100644 --- a/WDCM_OverviewDashboard/ui.R +++ b/WDCM_OverviewDashboard/ui.R @@ -1,4 +1,3 @@ - ### --------------------------------------------------------------------------- ### --- WDCM Dashboard Module, v. Beta 0.1 ### --- Script: ui.R, v. Beta 0.1 @@ -28,11 +27,25 @@ # - fluidRow Title fluidRow( column(width = 12, - h2('WDCM Overview Dahsboard'), - HTML('<font size="3"><b>Wikidata Concepts Monitor</b></font>'), - hr() + h2('WDCM Overview Dashboard'), + HTML('<font size="3"><b>Wikidata Concepts Monitor</b></font>') ) ), # - fluidRow Title END + + # - fluidRow Logo + fluidRow( + column(width = 12, + img(src='Wikidata-logo-en.png', + align = "left") + ) + ), # - fluidRow END + + # - hr() + fluidRow( + column(width = 12, + hr() + ) + ), fluidRow( column(width = 12, @@ -204,39 +217,61 @@ ) ), # - tabPanel Overview END - # - tabPanel Usage + # - tabPanel Description tabPanel("Description", fluidRow( - column(width = 12, + column(width = 8, HTML('<h2>WDCM Overview Dashboard</h2> <h4>Description<h4> <hr> <h4>Introduction<h4> <br> - <p><font size = 2>This Dashboard is a part of the Wikidata Concepts Monitor (WDMC). The WDCM system provides analytics on Wikidata usage + <p><font size = 2>This Dashboard is a part of the <b>Wikidata Concepts Monitor (WDMC)</b>. The WDCM system provides analytics on Wikidata usage across the client projects. The WDCM Overview Dashboard presents the big picture of Wikidata usage; other WDCM dashboards go - into more detail.</font></p> + into more detail. The Overview Dashboard provides insights into <b>(1)</b> the similarities between the client projects in respect to their use of + of Wikidata, as well as <b>(2)</b> the volume of Wikidata usage in every client project, <b>(3)</b> Wikidata usage tendencies, described by the volume of + Wikidata usage in each of the semantic categories of items that are encompassed by the current WDCM edition, <b>(4)</b> the similarities between the + Wikidata semantic categories of items in respect to their usage across the client projects, <b>(5)</b> ranking of client projects in respect to their + Wikidata usage volume, <b>(6)</b> the Wikidata usage breakdown across the types of client projects and Wikidata semantic categories.</font></p> <hr> - <h4>Wikidata Item Usage Definition</h4> + <h4>Definitions</h4> <br> - <p><font size = 2><b>NOTE.</b> The current Wikidata item usage statistic definition is <i>the count of the number of pages in a particular client project - where the respective Wikidata item is used</i>. Thus, the current definition ignores the usage aspects completely</font></p> + <p><font size = 2><b>N.B.</b> The current <b>Wikidata item usage statistic</b> definition is <i>the count of the number of pages in a particular client project + where the respective Wikidata item is used</i>. Thus, the current definition ignores the usage aspects completely. This definition is motivated by the currently + present constraints in Wikidata usage tracking across the client projects. With more mature Wikidata usage tracking systems, the definition will become a subject + of change. The term <b>Wikidata usage volume</b> is reserved for total Wikidata usage (i.e. the sum of usage statistics) in a particular + client project, group of client projects, or semantic categories. By a <b>Wikidata semantic category</b> we mean a selection of Wikidata items that is + that is operationally defined by a respective SPARQL query returning a selection of items that intuitivelly match a human, natural semantic category. + The structure of Wikidata does not necessarily match any intuitive human semantics. In WDCM, an effort is made to select the semantic categories so to match + the intuitive, everyday semantics as much as possible, in order to assist anyone involved in analytical work with this system. However, the choice of semantic + categories in WDCM is not necessarily exhaustive (i.e. they do not necessarily cover all Wikidata items), neither the categories are necessarily + mutually exclusive. The Wikidata ontology is very complex and a product of work of many people, so there is an optimization price to be paid in every attempt to + adapt or simplify its present structure to the needs of a statistical analytical system such as WDCM. The current set of WDCM semantic categories is thus not + normative in any sense and a subject of change in any moment, depending upon the analytical needs of the community.</font></p> <hr> <h4>Wikidata Usage Overview</h4> <br> <p><font size = 2>The similarity structure in Wikidata usage <i>across the client projects</i> is presented. Each bubble represents a client project. The size of the bubble reflects the volume of Wikidata usage in the respective project. Projects similar in respect to the semantics of Wikidata - usage are grouped together. Only top five projects (of each project type) in respect to Wikidata usage volume are labeled.<br> - The bubble chart is produced by performing a t-SNE dimensionality reduction of the inter-Projects Euclidean distances derived from the - Projects x Categories contingency table.</font></p> + usage are grouped together.<br> + The bubble chart is produced by performing a <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding" target="_blank">t-SNE dimensionality reduction</a> + of the client project pairwise Euclidean distances derived from the Projects x Categories contingency table. Given that the original higher-dimensional space + from which the 2D map is derived is rather constrained by the choice of a small number of semantic categories, the similarity mapping is somewhat + imprecise and should be taken as an attempt at an approximate big picture of the client projects similarity structure only. More precise 2D maps of + the similarity structures in client projects are found on the WDCM Semantics Dashboard, where each semantic category first receives an + <a href = "https://en.wikipedia.org/wiki/Topic_model" target = "_blank">LDA Topic Model</a>, + and the similarity structure between the client projects is then derived from project topical distributions.<br> + While the <i>Explore</i> tab presents a dynamic <a href = "http://hafen.github.io/rbokeh/" target="_blank">{Rbokeh}</a> visualization alongised + the tools to explore it in detail, the <i>Highlights</i> tab shows a static <a href = "http://ggplot2.org/" target="_blank">{ggplot2}</a> plot with the most important client projects + marked (<b>NOTE.</b> Only top five projects (of each project type) in respect to Wikidata usage volume are labeled).</font></p> <hr> <h4>Wikidata Usage Tendency</h4> <br> <p><font size = 2>The similarity structure in Wikidata usage <i>across the semantic categories</i> is presented. Each bubble represents a Wikidata semantic category. The size of the bubble reflects the volume of Wikidata usage from the respective category. If two categories are found in proximity, - that means that the projects that tend to use the one also tend to use the another, and vice versa.<br> - The bubble chart is produced by performing a t-SNE dimensionality reduction of the inter-Projects Euclidean distances derived from the - Categories x Projects contingency table.</font></p> + that means that the projects that tend to use the one also tend to use the another, and vice versa. Similarly to the Usage Overview, the 2D mapping is obtained by performing + a <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding" target="_blank">t-SNE dimensionality reduction</a> + of the categories pairwise Euclidean distances derived from the Projects x Categories contingency table. </font></p> <hr> <h4>Wikidata Usage Distribution</h4> <br> @@ -244,23 +279,32 @@ In the <i>Project Usage Rank-Frequency</i> plot, each point represents a client project; Wikidata usage is represented on the vertical and the project usage rank on the horizontal axis, while only top project (per project type) are labeled. The highly-skewed, asymmetrical distribution reveals that a small fraction of client projects only accounts for a huge proportion of Wikidata usage.<br> In the - <i>Project Usage log(Rank)-log(Frequency)</i> plot, the logarithms of both variables are represented. A power-law relationship holds true if this - plot is linear; the plot includes the best fit linear model, however, no attempts to estimate the Zipf distribution were made. </font></p> + <i>Project Usage log(Rank)-log(Frequency)</i> plot, the logarithms of both variables are represented. + A <a href = "https://en.wikipedia.org/wiki/Power_law" target="_blank">power-law</a> relationship holds true if this + plot is linear. The plot includes the best linear fit, however, no attempts to estimate the underlying probability distribution were made. </font></p> <hr> <h4>Client Project Types</h4> <br> <p><font size = 2>Project types are provided in the rows of this chart, while the semantic categories are given on the horizontal axis. - The height of the respective bar indicates Wikidata usage from the respective semantic category in a particular client project.</font></p> + The height of the respective bar indicates Wikidata usage volume from the respective semantic category in a particular client project.</font></p> <hr> <h4>Client Projects Usage Volume</h4> <br> - <p><font size = 2>Use the slider to select the percent range of the Wikidata usage distribution across the client project to show. The - chart will automatically adjust presenting the selected projects in increasing order of Wikidata usage, presenting at most 30 top projects - from the selection.</font></p> + <p><font size = 2>Use the slider to select the percentile rank range of the Wikidata usage volume distribution across the client project to show. The + chart will automatically adjust to present the selected projects in increasing order of Wikidata usage, and presenting at most 30 top projects + from the selection. <b>NOTE.</b> The <a href="https://en.wikipedia.org/wiki/Percentile_rank" target="_blank">percentile rank</a> + of a score is the percentage of scores in its frequency distribution that are equal to or lower than it. + For example, a client project that has a Wikidata usage volume greater than or equal to 75% of all client projects under + consideration is said to be at the 75th percentile, where 75 is the percentile rank.<br> In effect, you can browse the whole + distribution of Wikidata usage across the client projects by selecting the lower and uppers limit in terms of usage percentile rank.</font></p> <hr> <h4>Wikidata Usage Browser</h4> <br> - <p><font size = 2>A breakdown of Wikidata usage statistics across client projects, project types, and semantic categories.</font></p> + <p><font size = 2>A breakdown of Wikidata usage statistics across client projects and semantic categories. To the left, + a table that presents a Client Project vs. Semantic Category cross-tabulation. The Usage column in this table is the Wikidata + usage statistic for a particular Semantic Category x Client Project combination (e.g. The Wikidata usage in the category "Human" in + the dewiki project). To the right, the total Wikidata usage per client project is presented (i.e. the sum of Wikidata usage across + all semantic categories for a particular client project; e.g. the total Wikidata usage volume of enwiki).</font></p> ') ) ) diff --git a/WDCM_OverviewDashboard/www/Wikidata-logo-en.png b/WDCM_OverviewDashboard/www/Wikidata-logo-en.png new file mode 100644 index 0000000..380ea29 --- /dev/null +++ b/WDCM_OverviewDashboard/www/Wikidata-logo-en.png Binary files differ diff --git a/WDCM_UsageDashboard/server.R b/WDCM_UsageDashboard/server.R index 57d4b41..62fe121 100644 --- a/WDCM_UsageDashboard/server.R +++ b/WDCM_UsageDashboard/server.R @@ -708,6 +708,208 @@ contentType = "text/csv" ) + #### --- Chart: tabulations_projectTypesChart + output$tabulations_projectTypesChart <- renderPlot({ + # - Chart Frame for output$tabulations_projectTypesChart + plotFrame <- wdcmProjectCategory %>% + filter(Project %in% selectedProjects & Category %in% selectedCategories) %>% + group_by(`Project Type`) %>% + summarise(Usage = sum(Usage)) %>% + arrange(desc(Usage)) + # - top 25 categories: + if (dim(plotFrame)[1] > 25) { + plotFrame <- plotFrame[1:25, ] + } + plotFrame$`Project Type` <- factor(plotFrame$`Project Type`, + levels = plotFrame$`Project Type`[order(-plotFrame$Usage)]) + # - express labels as K, M: + plotFrame$Label <- sapply(plotFrame$Usage, function(x) { + if (x >= 1e+03 & x < 1e+06) { + out <- paste(round(x/1e+03, 1), "K", sep = "") + } else if (x > 1e+06) { + out <- paste(round(x/1e+06, 1), "M", sep = "") + } else { + out <- as.character(x) + } + return(out) + }) + # - Plot + ggplot(plotFrame, + aes(x = `Project Type`, y = Usage, label = Label)) + + geom_bar(stat = "identity", width = .6, fill = "#4c8cff") + + xlab('Project Type') + ylab('Entity Usage') + + ylim(0, max(plotFrame$Usage) + .1*max(plotFrame$Usage)) + + scale_y_continuous(labels = comma) + + geom_label(size = 3, vjust = -.1) + + theme_minimal() + + theme(axis.text.x = element_text(angle = 90, size = 12, hjust = 1)) + + theme(axis.title.x = element_text(size = 12)) + + theme(axis.title.y = element_text(size = 12)) + + theme(plot.title = element_text(size = 15)) %>% + withProgress(message = 'Generating plot', + min = 0, + max = 1, + value = 1, {incProgress(amount = 0)}) + }) + # - Download Frame: tabulations_projectTypesChart + tabulations_projectTypesChartDownload_Frame <- reactive({ + plotFrame <- wdcmProjectCategory %>% + filter(Project %in% selectedProjects & Category %in% selectedCategories) %>% + group_by(`Project Type`) %>% + summarise(Usage = sum(Usage)) %>% + arrange(desc(Usage)) + plotFrame + }) + # - Download: tabulations_projectTypesChart + output$tabulations_projectTypesChart_Frame <- downloadHandler( + filename = function() { + 'WDCM_Data.csv'}, + content = function(file) { + write.csv(tabulations_projectTypesChartDownload_Frame(), + file, + quote = FALSE, + row.names = FALSE) + }, + contentType = "text/csv" + ) + + #### --- Chart: crosstabulations_projectsCategoriesChart + output$crosstabulations_projectsCategoriesChart <- renderPlot({ + # - Chart Frame for output$crosstabulations_projectsCategoriessChart + plotFrame <- wdcmProjectCategory %>% + filter(Project %in% selectedProjects & Category %in% selectedCategories) %>% + arrange(desc(Usage)) + projectOrder <- plotFrame %>% + group_by(Project) %>% + summarise(Usage = sum(Usage)) %>% + arrange(desc(Usage)) + selProj <- projectOrder$Project[1:25] + plotFrame <- plotFrame %>% + filter(Project %in% selProj) + plotFrame$Project <- factor(plotFrame$Project, + levels = selProj) + # - express labels as K, M: + plotFrame$Label <- sapply(plotFrame$Usage, function(x) { + if (x >= 1e+03 & x < 1e+06) { + out <- paste(round(x/1e+03, 1), "K", sep = "") + } else if (x > 1e+06) { + out <- paste(round(x/1e+06, 1), "M", sep = "") + } else { + out <- as.character(x) + } + return(out) + }) + # - Plot + ggplot(plotFrame, + aes(x = Project, y = Usage, label = Label)) + + geom_line(size = .25, color = "#4c8cff", group = 1) + + geom_point(size = 1.5, color = "#4c8cff") + + geom_point(size = 1, color = "white") + + geom_text_repel(aes(label = plotFrame$Label), + size = 3) + + facet_wrap(~ Category, ncol = 3, scales = "free_y") + + xlab('Project') + ylab('Entity Usage') + + ylim(0, max(plotFrame$Usage) + .5*max(plotFrame$Usage)) + + scale_y_continuous(labels = comma) + + theme_minimal() + + theme(axis.text.x = element_text(angle = 90, size = 12, hjust = 1)) + + theme(axis.title.x = element_text(size = 12)) + + theme(axis.title.y = element_text(size = 12)) + + theme(plot.title = element_text(size = 15)) %>% + withProgress(message = 'Generating plot', + min = 0, + max = 1, + value = 1, {incProgress(amount = 0)}) + }) + # - Download Frame: crosstabulations_projectsCategoriesChart + crosstabulations_projectsCategoriesChartDownload_Frame <- reactive({ + plotFrame <- wdcmProjectCategory %>% + filter(Project %in% selectedProjects & Category %in% selectedCategories) %>% + arrange(desc(Usage)) + plotFrame + }) + # - Download: crosstabulations_projectsCategoriesFrame + output$crosstabulations_projectsCategoriesFrame <- downloadHandler( + filename = function() { + 'WDCM_Data.csv'}, + content = function(file) { + write.csv(crosstabulations_projectsCategoriesChartDownload_Frame(), + file, + quote = FALSE, + row.names = FALSE) + }, + contentType = "text/csv" + ) + + #### --- Chart: crosstabulations_projectTypesCategoriesChart + output$crosstabulations_projectTypesCategoriesChart <- renderPlot({ + # - Chart Frame for output$crosstabulations_projectTypesCategoriesChart + plotFrame <- wdcmProjectCategory %>% + filter(Project %in% selectedProjects & Category %in% selectedCategories) %>% + group_by(`Project Type`, Category) %>% + summarise(Usage = sum(Usage)) %>% + arrange(desc(Usage)) + projectTypeOrder <- plotFrame %>% + group_by(`Project Type`) %>% + summarise(Usage = sum(Usage)) %>% + arrange(desc(Usage)) + plotFrame$`Project Type` <- factor(plotFrame$`Project Type`, + levels = projectTypeOrder$`Project Type`) + # - express labels as K, M: + plotFrame$Label <- sapply(plotFrame$Usage, function(x) { + if (x >= 1e+03 & x < 1e+06) { + out <- paste(round(x/1e+03, 1), "K", sep = "") + } else if (x > 1e+06) { + out <- paste(round(x/1e+06, 1), "M", sep = "") + } else { + out <- as.character(x) + } + return(out) + }) + # - Plot + ggplot(plotFrame, + aes(x = `Project Type`, y = Usage, label = Label)) + + geom_line(size = .25, color = "#4c8cff", group = 1) + + geom_point(size = 1.5, color = "#4c8cff") + + geom_point(size = 1, color = "white") + + geom_text_repel(aes(label = plotFrame$Label), + size = 3) + + facet_wrap(~ Category, ncol = 3, scales = "free_y") + + xlab('Project Type') + ylab('Entity Usage') + + ylim(0, max(plotFrame$Usage) + .5*max(plotFrame$Usage)) + + scale_y_continuous(labels = comma) + + theme_minimal() + + theme(axis.text.x = element_text(angle = 90, size = 12, hjust = 1)) + + theme(axis.title.x = element_text(size = 12)) + + theme(axis.title.y = element_text(size = 12)) + + theme(plot.title = element_text(size = 15)) %>% + withProgress(message = 'Generating plot', + min = 0, + max = 1, + value = 1, {incProgress(amount = 0)}) + }) + # - Download Frame: crosstabulations_projectTypeCategoriesChart + crosstabulations_projectTypeCategoriesChartDownload_Frame <- reactive({ + plotFrame <- wdcmProjectCategory %>% + filter(Project %in% selectedProjects & Category %in% selectedCategories) %>% + group_by(`Project Type`, Category) %>% + summarise(Usage = sum(Usage)) %>% + arrange(desc(Usage)) + plotFrame + }) + # - Download: crosstabulations_projectTypeCategoriesChartFrame + output$crosstabulations_projectTypeCategoriesChartFrame <- downloadHandler( + filename = function() { + 'WDCM_Data.csv'}, + content = function(file) { + write.csv(crosstabulations_projectTypeCategoriesChartDownload_Frame(), + file, + quote = FALSE, + row.names = FALSE) + }, + contentType = "text/csv" + ) + }) diff --git a/WDCM_UsageDashboard/ui.R b/WDCM_UsageDashboard/ui.R index 000fdbd..2d249ae 100644 --- a/WDCM_UsageDashboard/ui.R +++ b/WDCM_UsageDashboard/ui.R @@ -28,10 +28,25 @@ fluidRow( column(width = 12, h2('WDCM Usage Dashboard'), - HTML('<font size="3"><b>Wikidata Concepts Monitor</b></font>'), - hr() + HTML('<font size="3"><b>Wikidata Concepts Monitor</b></font>') + ) ), # - fluidRow Title END + + # - fluidRow Logo + fluidRow( + column(width = 12, + img(src='Wikidata-logo-en.png', + align = "left") + ) + ), # - fluidRow END + + # - hr() + fluidRow( + column(width = 12, + hr() + ) + ), # - fluidRow Boxes fluidRow( @@ -285,16 +300,57 @@ fluidRow( column(width = 6, h4('Projects'), - plotOutput('tabulations_projectsChart'), + withSpinner(plotOutput('tabulations_projectsChart', height = "600px")), downloadButton('tabulations_projectsDownload_Frame', 'Data (csv)') ), column(width = 6, h4('Categories'), - plotOutput('tabulations_categoriesChart'), + withSpinner(plotOutput('tabulations_categoriesChart', height = "600px")), downloadButton('tabulations_categoriesDownload_Frame', 'Data (csv)') ) + ), + fluidRow( + column(width = 12, + hr() + ) + ), + fluidRow( + column(width = 6, + h4('Project Types'), + withSpinner(plotOutput('tabulations_projectTypesChart', height = "600px")), + downloadButton('tabulations_projectTypesChart_Frame', + 'Data (csv)') + ), + column(width = 6 + ) + ), + fluidRow( + column(width = 12, + hr() + ) + ), + fluidRow( + column(width = 12, + h4('Project vs Categories'), + withSpinner(plotOutput('crosstabulations_projectsCategoriesChart', height = "850px")), + downloadButton('crosstabulations_projectsCategoriesFrame', + 'Data (csv)') + ) + ), + fluidRow( + column(width = 12, + hr() + ) + ), + fluidRow( + column(width = 12, + h4('Project Types vs Categories'), + withSpinner(plotOutput('crosstabulations_projectTypesCategoriesChart', height = "850px")), + downloadButton('crosstabulations_projectTypeCategoriesChartFrame', + 'Data (csv)') + ) ) ) ), diff --git a/WDCM_UsageDashboard/www/Wikidata-logo-en.png b/WDCM_UsageDashboard/www/Wikidata-logo-en.png new file mode 100644 index 0000000..380ea29 --- /dev/null +++ b/WDCM_UsageDashboard/www/Wikidata-logo-en.png Binary files differ -- To view, visit https://gerrit.wikimedia.org/r/380770 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ib912ed9dfa5060bbe888714b648be77eb6f69734 Gerrit-PatchSet: 1 Gerrit-Project: analytics/wmde/WDCM Gerrit-Branch: master Gerrit-Owner: GoranSMilovanovic <goran.milovanovic_...@wikimedia.de> Gerrit-Reviewer: GoranSMilovanovic <goran.milovanovic_...@wikimedia.de> Gerrit-Reviewer: jenkins-bot <> _______________________________________________ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits