[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool
Manybubbles added a comment. In https://phabricator.wikimedia.org/T95779#1201191, @Smalyshev wrote: I think in this case we shouldn't mess with the data. Rather, we'd have something like function bestLabel(item, languages) e.g. bestLabel(wd:Q123, 'en', 'de', 'ru', 'es') which would try to find labels on any language but if not just return something like 'Q123'. The thing is not all queries even need labels... and for those that do it we can not predict what people would actually want there - simple lookup, hierarchy lookup, which languages, etc. That'd be nice but the reality right now is that bestLabel would be super slow without some work in blazegraph. The singleLabel option is optional. You could just not specify it and we'd leave the data alone. I think throwing out label data that we don't want is ok, but adding is not good as it may be confused with actual data. Yeah. +1. I'll fix it in a few minutes. TASK DETAIL https://phabricator.wikimedia.org/T95779 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manybubbles Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, daniel, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool
Smalyshev added a comment. I think in this case we shouldn't mess with the data. Rather, we'd have something like function bestLabel(item, languages) e.g. bestLabel(wd:Q123, 'en', 'de', 'ru', 'es') which would try to find labels on any language but if not just return something like 'Q123'. The thing is not all queries even need labels... and for those that do it we can not predict what people would actually want there - simple lookup, hierarchy lookup, which languages, etc. I think throwing out label data that we don't want is ok, but adding is not good as it may be confused with actual data. TASK DETAIL https://phabricator.wikimedia.org/T95779 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manybubbles, Smalyshev Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, daniel, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool
Manybubbles added a comment. Its how I designed single label mode but I now think its stupid. The point of single label was that you could always get a single label for a thing and its in one of the languages you ask for. Instead I think it shouldn't ever add a label or description if there isn't one in the language. TASK DETAIL https://phabricator.wikimedia.org/T95779 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manybubbles Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, daniel, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool
Smalyshev added a comment. Reproducible with input F83: in.ttl https://phabricator.wikimedia.org/F83 and this command line: `java -cp target/wikidata-query-tools-0.0.1-SNAPSHOT-jar-with-dependencies.jar org.wikidata.query.rdf.tool.Munge --from in.ttl --to out --labelLanguage en --labelLanguage de --singleLabel en --singleLabel de --skipSiteLinks --chunkSize 10` TASK DETAIL https://phabricator.wikimedia.org/T95779 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manybubbles, Smalyshev Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, daniel, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool
Smalyshev added a comment. I think it happens because singleLabelModeWorkForDescription and singleLabelModeWorkForLabel both generate this: return new StatementImpl(entityUriImpl, new URIImpl(RDFS.LABEL), entityUriImpl); Which obviously wrong for description and I suspect also wrong for label. So if label or description is missing, this is what happens. Also I note that German label and description is dropped, even though skos:altLabel is preserved for both en and de. Looks fishy. TASK DETAIL https://phabricator.wikimedia.org/T95779 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign username. EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manybubbles, Smalyshev Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, GWicke, daniel, JanZerebecki ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs