MadBob created this task.
MadBob added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION
  **Steps to replicate the issue** (include links if applicable):
  
  - fresh install of Debian and OpenJDK
  - follow the instructions found here 
https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md
  - after the munge.sh part, examine the wikidump-000000*.ttl.gz files
  - find lots of � (UTF-8 replacement character) in place of non-ascii 
characters
  
  **What happens?**:
  
  By default, OpenJDK (at least, on Debian) has ANSI_X3.4-1968 file.encoding 
set. This breaks all UTF-8 characters.
  Perhaps file encoding have to be forced within the WQS programs, or a proper 
notice to change own system configurations have to be added in the 
documentation.
  
  **What should have happened instead?**:
  
  Munged strings should have proper UTF-8 strings.
  
  **Software version** (skip for WMF-hosted wikis like Wikipedia):
  
  service-0.3.118-SNAPSHOT

TASK DETAIL
  https://phabricator.wikimedia.org/T323575

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: MadBob
Cc: MadBob, Aklapper, AWesterinen, MPhamWMF, CBogen, Namenlos314, Gq86, 
Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to