[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service

2019-09-30 Thread Smalyshev
Smalyshev added a comment. The issue is that by default Blazegraph uses tertiary ICU collation level IIRC (I can check specific one) so it ignores some differences like that one - generating same term key for both. It can be switched to

[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service

2019-09-30 Thread Igorkim78
Igorkim78 added a comment. These characters are indeed mapped to the same term in the DB. SELECT ( ConstantNode(TermId(1415304733L)[⓬]) AS VarNode(negativeCircled) ) ( ConstantNode(TermId(1415304733L)[⑫]) AS VarNode(circled) ) Blazegraph uses ICU collation as default key builder

[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service

2019-09-23 Thread Lea_Lacroix_WMDE
Lea_Lacroix_WMDE added a comment. Comment and more testing added here TASK DETAIL https://phabricator.wikimedia.org/T233204 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service

2019-09-18 Thread Smalyshev
Smalyshev added a comment. Probably related to other issues about Unicode and to ICU collation level. I presume collation level enabled now at Blazegraph confuses these two. TASK DETAIL https://phabricator.wikimedia.org/T233204 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service

2019-09-18 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment. This can also be reproduced without reference to any particular item (link ): SELECT ("⑫" AS ?circled) ("⓬"

[Wikidata-bugs] [Maniphest] [Commented On] T233204: Mixup of unicode characters in Query Service

2019-09-18 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment. This version of the query