[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-10-19 Thread So9q
So9q added a comment. In T281854#7062631 , @Fnielsen wrote: > "percentage, number of scientific papers that are connected to non-scientific paper items in WD" > Quite a lot of scholarly papers are connected to a journal item, to one

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF removed a parent task: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To:

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-10-18 Thread AKhatun_WMF
AKhatun_WMF added a parent task: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc:

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-09-27 Thread Gehel
Gehel closed this task as "Resolved". Gehel added a comment. I'm closing this as the statistics have been collected and published. The larger discussion on should probably continue on this talk page: https://www.wikidata.org/wiki/Wikidata:Query_Service_scaling_update_Aug_2021 TASK DETAIL

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF added a comment. Here is the analysis done on scholarly articles in Wikidata and WDQS queries related to them: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-09-24 Thread AKhatun_WMF
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: Csisc, So9q, AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-08 Thread Esc3300
Esc3300 added a comment. We now have author names a detailed strings, so queries to P50 wont necessarily need to be considered. The overall situation is comparable to Commons, where "depicts" statements link to Wikidata items. TASK DETAIL

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-07 Thread EgonWillighagen
EgonWillighagen added a comment. 1,939,738 authors -> https://w.wiki/3o2i trying to get all unique properties of these times out. Samples 50k authors for properties with an author as subject, https://w.wiki/3o3C, results: - 96% is linked to a profession (P106

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread Daniel_Mietchen
Daniel_Mietchen added a comment. In T281854#7063597 , @Lydia_Pintscher wrote: > Thanks everyone. For context: this is just one of many options we are currently investigating to create an overview of our options. We think it is

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF added a comment. In T281854#7266495 , @EgonWillighagen wrote: > @AKhatun_WMF, when you write "authors connected to other subgraphs", do you mean subgraphs within Wikidata (so, excluding external identifiers), or also graphs

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread EgonWillighagen
EgonWillighagen added a comment. @AKhatun_WMF, when you write "authors connected to other subgraphs", do you mean subgraphs within Wikidata (so, excluding external identifiers), or also graphs from other resources part of, for example, the Linked Open Data Cloud? TASK DETAIL

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-06 Thread AKhatun_WMF
AKhatun_WMF updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread Harej
Harej added a comment. Wikicite.org uses an extremely broad definition of publication that includes far more than scholarly sources. There are some thousands of classes that are counted as subclasses of “publication”. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread Fnielsen
Fnielsen added a comment. Wikicite.org (Jakob Voß) http://wikicite.org/statistics.html states 39 994  937 = 43% for 2021-06-28. The Scholia statistics is only for the "scholarly article" item. I think Voß counts instances of scholarly + non-scholarly publications. TASK DETAIL

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread Fnielsen
Fnielsen added a comment. >> "percentage, number of Wikidata entities that are scholarly article": >> 37.246.721 Scholarly articles, so 37/97 ~ 40% are scholarly articles. > > Could I get an idea of what the 97 was and where the number was listed maybe? Hmmm... Maybe I meant 94.

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread CBogen
CBogen assigned this task to AKhatun_WMF. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: AKhatun_WMF, CBogen Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread CBogen
CBogen added a project: Discovery-Search (Current work). TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: CBogen Cc: AKhatun_WMF, Esc3300, SCIdude, Sj, Harej, Andrawaag, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-08-05 Thread CBogen
CBogen renamed this task from "[EPIC] Get baseline measurements/expectations for splitting scholarly articles from Wikidata" to "Get baseline measurements/expectations for splitting scholarly articles from Wikidata". CBogen removed a project: Epic. TASK DETAIL

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-24 Thread AKhatun_WMF
AKhatun_WMF added a comment. In T281854#7062631 , @Fnielsen wrote: > Some of the statistics that is wanted are listed on Scholia, currently on the frontpage: https://scholia.toolforge.org/ (UPDATE: now here:

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-22 Thread Esc3300
Esc3300 added a comment. There is a recent request to make items for scholarly articles more stand-alone, i.e. - Wikidata:Property proposal/Author last names - Wikidata:Property proposal/Author first names

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-12 Thread MPhamWMF
MPhamWMF added a comment. @Sj This is primarily being evaluated as a last resort mitigation in the case of catastrophic failure, specifically having to do with max size limitations of Blazegraph. The primary aim is to determine the best way of keeping WD/QS minimally functional in the event

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-07 Thread EgonWillighagen
EgonWillighagen added a comment. In T281854#7185253 , @Multichill wrote: > No it's not, please have a look at the task description. This is about getting metrics. Can you elaborate on the "this plan" in that description? What do

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-07 Thread Sj
Sj added a comment. @Multichill the opening says "so that I can decide whether to move ahead with this plan and how to communicate it." -- it would help if that linked to a separate task, whose implementation depended on the outcome of this one. In the absence of that, this seems like the

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-07-07 Thread Sj
Sj added a comment. @Multichill the opening says "so that I can decide whether to move ahead with this plan and how to communicate it." -- it would help if that linked to a separate task, whose implementation depended on the outcome of this one. In the absence of that, this seems like the

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Multichill
Multichill added a subscriber: Harej. Multichill added a comment. In T281854#7184875 , @Harej wrote: > In T281854#7184854 , @Multichill wrote: > >> This is not the place to discus

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Fnielsen
Fnielsen added a comment. "percentage, number of WDQS queries per month that involve scholarly articles (including authors and publications)" It is unclear for us Scholia people how much load we are putting on WDQS. We have a tendency to do multiple SPARQL queries on each page and that

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Fnielsen
Fnielsen added a comment. Going back to the quantifiable: "percentage, number of scientific papers that are connected to non-scientific paper items in WD (not including authors and publications)" We would hope that every scientific paper has a topic annotation with one or more of the

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Harej
Harej added a comment. In T281854#7184854 , @Multichill wrote: > This is not the place to discus if these items should be moved out or not. This is a confusing statement seeing as the task is explicitly about "splitting scholarly

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-29 Thread Multichill
Multichill added a comment. Hi folks, please stick to the Phabricator etiquette as described at https://www.mediawiki.org/wiki/Bug_management/Phabricator_etiquette . This is not the place to discus if these items should be moved out or not. @MPhamWMF don't see these comments as any

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-20 Thread Fnielsen
Fnielsen added a comment. @Andrawaag "it is becoming more difficult to see other topics (sometimes unrelated to scholarly articles)" Do you have concrete examples on this? It may sometimes be difficult to find out what is a topic and what is a scientific articles, but once a few scientific

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-19 Thread EgonWillighagen
EgonWillighagen added a comment. Regarding the question of the "growth of scientific literature", there is a good bit of literature on this, and sometimes conflated with the topic of "growth of science". I started collecting some knowledge about this:

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-19 Thread Andrawaag
Andrawaag added a comment. I would not call it evicting scholarly articles. Scholarly articles are currently a major driving force for Wikidata, however, its size is problematic because it is becoming more difficult to see other topics (sometimes unrelated to scholarly articles). I have

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-19 Thread EgonWillighagen
EgonWillighagen added a comment. I am with @Harej here. Focusing on the largest data set is not the right approach. As I have indicated in similar discussions elsewhere, there will be a next large subset and this one will also be large. From the field chemistry, 60M items is nothing. The

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-18 Thread Harej
Harej added a comment. I'm interested in others' opinions as well because I am far from the only perspective in the room. First: at what levels would this graph division take place? Would this be something largely behind the scenes, not visible to the Wikidata community unless you're

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-18 Thread MPhamWMF
MPhamWMF added a comment. Thanks for your thoughts here @Harej; it's really helpful to have these insights from someone closer to the (Wiki)data content itself. You are correct that this specific ticket is identifying the largest subset of data to split off from the Wikidata graph. The

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-18 Thread Harej
Harej added a comment. This seems like an arbitrary way to cut up Wikidata. It very much smacks of "let's take the largest subset of our dataset and evict it," without consideration to why the dataset should be cut up this way. What are the boundaries of these new projects? Is Wikidata

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-10 Thread MPhamWMF
MPhamWMF added a parent task: T282790: Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: MPhamWMF Cc:

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-06-10 Thread MPhamWMF
MPhamWMF triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: MPhamWMF Cc: Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius,

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-07 Thread waldyrious
waldyrious updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: waldyrious Cc: Lydia_Pintscher, Mohammed_Sadat_WMDE, nichtich, EgonWillighagen, Fnielsen, Darwinius,

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. Thanks everyone. For context: this is just one of many options we are currently investigating to create an overview of our options. We think it is important to have a larger discussion about how to move forward with the Query Service but we need to know more

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread Fnielsen
Fnielsen added a subscriber: nichtich. Fnielsen added a comment. "rate of growth of scholarly articles" wikicite.org updates this statistics: http://wikicite.org/statistics.html I suppose that is Jakob Voß (@nichtich) that updates these numbers? The graph shows a bit of plateauing

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread Fnielsen
Fnielsen added a comment. Some of the statistics that is wanted are listed on Scholia, currently on the frontpage: https://scholia.toolforge.org/ "percentage, number of Wikidata entities that are scholarly article": 37.246.721Scholarly articles, so 37/97 ~ 40% are scholarly

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread Daniel_Mietchen
Daniel_Mietchen added a comment. In T281854#7061460 , @MPhamWMF wrote: > For larger context, this is not to say we're committed to this split yet, but we are exploring strategies for scaling Wikidata (and mitigating catastrophic

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread MPhamWMF
MPhamWMF added a comment. @LWyatt , "splitting scholarly articles out" here refers to separating out the subgraph of scholarly articles -- possibly copying over directly relevant items like authors -- from the larger Wikidata graph so that they would be independent graphs. They would exist

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-05 Thread LWyatt
LWyatt added a comment. Can I get clarification about what is meant, practically, by "splitting scholarly articles out"? Does this mean something in the backend that is about how that content is stored/accessed by the query system (but is otherwise invisible to the general reader of

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-04 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T281854 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Maintenance_bot Cc: Aklapper, MPhamWMF, Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314,

[Wikidata-bugs] [Maniphest] T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata

2021-05-04 Thread MPhamWMF
MPhamWMF created this task. MPhamWMF added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. TASK DESCRIPTION As a product manager for Wikidata and WDQS, I want to know what quantifiable benefits to service reliability and quality I might expect to gain