[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool

2015-04-13 Thread Manybubbles
Manybubbles added a comment.

In https://phabricator.wikimedia.org/T95779#1201191, @Smalyshev wrote:

 I think in this case we shouldn't mess with the data. Rather, we'd have 
 something like function bestLabel(item, languages) e.g. bestLabel(wd:Q123, 
 'en', 'de', 'ru', 'es') which would try to find labels on any language but if 
 not just return something like 'Q123'. The thing is not all queries even need 
 labels... and for those that do it we can not predict what people would 
 actually want there - simple lookup, hierarchy lookup, which languages, etc.


That'd be nice but the reality right now is that bestLabel would be super slow 
without some work in blazegraph.  The singleLabel option is optional.  You 
could just not specify it and we'd leave the data alone.

 I think throwing out label data that we don't want is ok, but adding is not 
 good as it may be confused with actual data.


Yeah.  +1.  I'll fix it in a few minutes.


TASK DETAIL
  https://phabricator.wikimedia.org/T95779

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles
Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
GWicke, daniel, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool

2015-04-11 Thread Smalyshev
Smalyshev added a comment.

I think in this case we shouldn't mess with the data. Rather, we'd have 
something like function bestLabel(item, languages) e.g. bestLabel(wd:Q123, 
'en', 'de', 'ru', 'es') which would try to find labels on any language but if 
not just return something like 'Q123'. The thing is not all queries even need 
labels... and for those that do it we can not predict what people would 
actually want there - simple lookup, hierarchy lookup, which languages, etc.

I think throwing out label data that we don't want is ok, but adding is not 
good as it may be confused with actual data.


TASK DETAIL
  https://phabricator.wikimedia.org/T95779

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles, Smalyshev
Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
GWicke, daniel, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool

2015-04-11 Thread Manybubbles
Manybubbles added a comment.

Its how I designed single label mode but I now think its stupid. The point
of single label was that you could always get a single label for a thing
and its in one of the languages you ask for.

Instead I think it shouldn't ever add a label or description if there isn't
one in the language.


TASK DETAIL
  https://phabricator.wikimedia.org/T95779

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles
Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
GWicke, daniel, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool

2015-04-10 Thread Smalyshev
Smalyshev added a comment.

Reproducible with input F83: in.ttl 
https://phabricator.wikimedia.org/F83 and this command line:

`java -cp target/wikidata-query-tools-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
org.wikidata.query.rdf.tool.Munge --from in.ttl --to out --labelLanguage en 
--labelLanguage de --singleLabel en --singleLabel de --skipSiteLinks 
--chunkSize 10`


TASK DETAIL
  https://phabricator.wikimedia.org/T95779

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles, Smalyshev
Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
GWicke, daniel, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T95779: Broken output from Munger tool

2015-04-10 Thread Smalyshev
Smalyshev added a comment.

I think it happens because singleLabelModeWorkForDescription and 
singleLabelModeWorkForLabel both generate this:

  return new StatementImpl(entityUriImpl, new URIImpl(RDFS.LABEL), 
entityUriImpl);

Which obviously wrong for description and I suspect also wrong for label. So if 
label or description is missing, this is what happens.

Also I note that German label and description is dropped, even though 
skos:altLabel is preserved for both en and de. Looks fishy.


TASK DETAIL
  https://phabricator.wikimedia.org/T95779

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
username.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manybubbles, Smalyshev
Cc: Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
GWicke, daniel, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs