Re: XML::Simple Umlaute
Take a look at the -C argument for perl and the PERL_UNICODE environment variable in http://perldoc.perl.org/perlrun.html Examine the difference between perl -E 'say "\x{df}"' and PERL_UNICODE=O perl -E 'say "\x{df}"' That said, if you are working with the web, why in the world are you sending UTF-8? HTML has entities for a reason. I would suggest using HTML::Entities instead of trying to send non-ASCII characters through who knows how many layers of things that can screw up UTF-8: perl -MHTML::Entities -E 'say encode_entities "\x{df}"' On Tue, Aug 9, 2016 at 7:34 AM hw wrote: > Chas. Owens schrieb: > > > > On Thu, Jul 28, 2016 at 10:55 AM Paul Johnson p...@pjcj.net>> wrote: > > > > On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote: > > > > snip > > > > > Also, this answer on StackOverflow by tchrist (Tom Christiansen, > who I > > > would say knows the most about the intersection of Perl and > Unicode) > > > is a good resource: http://stackoverflow.com/a/6163129/78259 > > > > Quite. And utf8::all tries to encapsulate as much of that > boilerplate > > as it can. > > > > > > I have always read that answer as a bit of an indictment of the idea of > "you should be able to load this module and everything will be fine". > Unicode is complex and trying to treat it like just another list of > characters is doomed to teeth gnashing and crying. Of course, even > treating it the way it should be leads to teeth gnashing and crying, but at > least that will be over the fact the humans suck (we can't even agree on > where þ should be sorted) as opposed to Perl sucking. > > When I have something like > > > print $cgi->p('Gebäudefläche:'); > > > in my source, which is correctly displayed everywhere else, I also > need it correctly displayed in the web browser --- even particularly > there because that is what the users are looking at. > > And that´s all there is to it. It´s really that simple. > >
Re: XML::Simple Umlaute
Chas. Owens schrieb: On Thu, Jul 28, 2016 at 10:05 AM, hw wrote: snip So which character encoding on STDOUT does perl use by default? That should be utf-8 without any further ado, shouldn´t it? When I add binmode STDOUT, ":encoding(utf-8)"; the characters are displayed correctly in the terminal. Why would perl use something else than utf-8 by default? Take the following with a grain of salt. My knowledge is mostly hearsay and supposition with a dash of cargo cultism on this matter. Perl predates even Unicode (they both came out in '87). Unicode did not get much traction until the mid-nineties when people started realizing that UTF-8 (created in '92) was a good thing. So, for most of its early history, Perl used Latin1. It still does to a large extent for backwards compatibility reasons. To make Perl 5 a proper UTF-8 environment there are a number of knobs to pull and buttons to poke. You may find this video from YAPC NA 2016 enlightening: https://www.youtube.com/watch?v=TmTeXcEixEg Others that may be helpful (I haven't watched them, but I trust the speaker): https://www.youtube.com/watch?v=iZgqhVu72zc https://www.youtube.com/watch?v=X2FQHUHjo8M Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I would say knows the most about the intersection of Perl and Unicode) is a good resource: http://stackoverflow.com/a/6163129/78259 Hope this helps. Thanks! That makes it really complicated to write applications which display data from a database via a web browser --- yet ppl are doing this since a pretty long time now. But no matter what I do, Umlaute are not displayed correctly throughout the whole web page: they are either wrong in the data from the database or in print statements or in the output of the CGI::FormBuilder. There´s probably no way to get it right :( -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: XML::Simple Umlaute
Chas. Owens schrieb: On Thu, Jul 28, 2016 at 10:55 AM Paul Johnson mailto:p...@pjcj.net>> wrote: On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote: snip > Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I > would say knows the most about the intersection of Perl and Unicode) > is a good resource: http://stackoverflow.com/a/6163129/78259 Quite. And utf8::all tries to encapsulate as much of that boilerplate as it can. I have always read that answer as a bit of an indictment of the idea of "you should be able to load this module and everything will be fine". Unicode is complex and trying to treat it like just another list of characters is doomed to teeth gnashing and crying. Of course, even treating it the way it should be leads to teeth gnashing and crying, but at least that will be over the fact the humans suck (we can't even agree on where þ should be sorted) as opposed to Perl sucking. When I have something like print $cgi->p('Gebäudefläche:'); in my source, which is correctly displayed everywhere else, I also need it correctly displayed in the web browser --- even particularly there because that is what the users are looking at. And that´s all there is to it. It´s really that simple. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: XML::Simple Umlaute
Paul Johnson schrieb: On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote: On Thu, Jul 28, 2016 at 10:05 AM, hw wrote: snip So which character encoding on STDOUT does perl use by default? That should be utf-8 without any further ado, shouldn´t it? When I add binmode STDOUT, ":encoding(utf-8)"; the characters are displayed correctly in the terminal. Why would perl use something else than utf-8 by default? As a general rule, use "utf8::all" instead of just "utf8" and a lot of the problems go away. Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I would say knows the most about the intersection of Perl and Unicode) is a good resource: http://stackoverflow.com/a/6163129/78259 Quite. And utf8::all tries to encapsulate as much of that boilerplate as it can. Maybe that would work, but I can´t very well go through all the programs and adjust them and experiment every time there is a problem like this. I need some sort of general switch to make perl use utf8 by default, as it should to begin with ... -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: XML::Simple Umlaute
I'm not sure if it is possible to use Umlaute in XML Files or not. Maybe this post with help you: http://stackoverflow.com/questions/11772468/reading-xml-files-with-umlaut-chars Is there a way to change encoding to "iso-8859-1"? Mike On 7/28/2016 8:03 AM, beginners-digest-h...@perl.org wrote: Hi, I would like to read XML files which look like this: uuid:ee1bd852-37ee-4965-a097-50130cf6dac7 Infostand 5449000134264 groß 5449000134264 5449000134264 10.0 20 There is an Umlaut, ß, supposed to be at groß which is apparently impossible to read. The following program ... #!/usr/bin/perl use strict; use warnings; use feature 'say'; use XML::Simple; use Data::Dumper; my $xml = new XML::Simple; my $data = $xml->XMLin("test.xml"); open my $fh, ">", 'pout'; print $fh Dumper($data); close $fh; print Dumper($data); exit 0; ... gives me this output: $VAR1 = { 'Bezeichnung1' => {}, 'id' => 'build_Inventur_1469705446', 'Stationsnummer' => 'Infostand', 'meta' => { 'content' => 'text/html; charset=UTF-8', 'http-equiv' => 'content-type', 'instanceID' => 'uuid:ee1bd852-37ee-4965-a097-50130cf6dac7' }, 'Mitarbeiter_inv' => '5449000134264', 'Regaletikett_ausgeben' => "gro\x{df}", 'Erfassung' => { 'Artikelstapel' => { 'Menge' => '20', 'Preis' => '10.0', 'EAN_Artikel' => '5449000134264', 'Etikettentyp' => {} }, 'Artikel_erfassen' => {}, 'Lagerstaette' => '5449000134264' } }; I´m not getting any better results when adding an encoding tag to the XML file and when writing the Dumper output to a file. Is it impossible to use Umlaute in XML Files?
Re: XML::Simple Umlaute
On Thu, Jul 28, 2016 at 10:55 AM Paul Johnson wrote: > On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote: snip > > Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I > > would say knows the most about the intersection of Perl and Unicode) > > is a good resource: http://stackoverflow.com/a/6163129/78259 > > Quite. And utf8::all tries to encapsulate as much of that boilerplate > as it can. > I have always read that answer as a bit of an indictment of the idea of "you should be able to load this module and everything will be fine". Unicode is complex and trying to treat it like just another list of characters is doomed to teeth gnashing and crying. Of course, even treating it the way it should be leads to teeth gnashing and crying, but at least that will be over the fact the humans suck (we can't even agree on where þ should be sorted) as opposed to Perl sucking.
Re: XML::Simple Umlaute
On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote: > On Thu, Jul 28, 2016 at 10:05 AM, hw wrote: > snip > > So which character encoding on STDOUT does perl use by default? That should > > be utf-8 without any further ado, shouldn´t it? When I add > > > > > > binmode STDOUT, ":encoding(utf-8)"; > > > > > > the characters are displayed correctly in the terminal. Why would perl use > > something else than utf-8 by default? As a general rule, use "utf8::all" instead of just "utf8" and a lot of the problems go away. > Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I > would say knows the most about the intersection of Perl and Unicode) > is a good resource: http://stackoverflow.com/a/6163129/78259 Quite. And utf8::all tries to encapsulate as much of that boilerplate as it can. -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: XML::Simple Umlaute
On Thu, Jul 28, 2016 at 10:05 AM, hw wrote: snip > So which character encoding on STDOUT does perl use by default? That should > be utf-8 without any further ado, shouldn´t it? When I add > > > binmode STDOUT, ":encoding(utf-8)"; > > > the characters are displayed correctly in the terminal. Why would perl use > something else than utf-8 by default? Take the following with a grain of salt. My knowledge is mostly hearsay and supposition with a dash of cargo cultism on this matter. Perl predates even Unicode (they both came out in '87). Unicode did not get much traction until the mid-nineties when people started realizing that UTF-8 (created in '92) was a good thing. So, for most of its early history, Perl used Latin1. It still does to a large extent for backwards compatibility reasons. To make Perl 5 a proper UTF-8 environment there are a number of knobs to pull and buttons to poke. You may find this video from YAPC NA 2016 enlightening: https://www.youtube.com/watch?v=TmTeXcEixEg Others that may be helpful (I haven't watched them, but I trust the speaker): https://www.youtube.com/watch?v=iZgqhVu72zc https://www.youtube.com/watch?v=X2FQHUHjo8M Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I would say knows the most about the intersection of Perl and Unicode) is a good resource: http://stackoverflow.com/a/6163129/78259 Hope this helps. -- Chas. Owens http://github.com/cowens The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: XML::Simple Umlaute
Chas. Owens schrieb: Data::Dumper is dumping the internal format. To ensure compatibility, it is using the \x{df} escape to represent LATIN SMALL LETTER SHARP S. To see it rendered as a character, just print it: Thanks! That kinda works: #!/usr/bin/perl use strict; use warnings; use feature 'say'; use utf8; use XML::Simple; use Data::Dumper; my $xml = new XML::Simple; my $data = $xml->XMLin("test.xml"); open my $fh, ">", 'pout'; binmode $fh, ":encoding(utf-8)"; print $fh Dumper($data); print Dumper($data); print $fh $data->{'Regaletikett_ausgeben'}; close $fh; if($data->{'Regaletikett_ausgeben'} eq 'groß') { say 'ist groß'; } else { say 'nicht groß'; } say 'ok'; say 'test-1: äöüÄÖÜß'; say "test-2: äöüÄÖÜß"; print "test-3: äöüÄÖÜß\n"; exit 0; Output is: $VAR1 = { 'Regaletikett_ausgeben' => "gro\x{df}", 'Mitarbeiter_inv' => '5449000134264', 'Bezeichnung1' => {}, 'Stationsnummer' => 'Infostand', 'Erfassung' => { 'Lagerstaette' => '5449000134264', 'Artikel_erfassen' => {}, 'Artikelstapel' => { 'Etikettentyp' => {}, 'EAN_Artikel' => '5449000134264', 'Menge' => '20', 'Preis' => '10.0' } }, 'meta' => { 'instanceID' => 'uuid:ee1bd852-37ee-4965-a097-50130cf6dac7', 'http-equiv' => 'content-type', 'content' => 'text/html; charset=UTF-8' }, 'id' => 'build_Inventur_1469705446' }; ist gro ok test-1: � test-2: � test-3: � In case you can´t see it: The test-printing shows a single unknown character instead of äöüÄÖÜß. Now 'env' says: [...] LANG=de_DE.utf8 [...] I´m looking at an xterm window which is connected via ssh to a remote host on which an instance of tmux is running to wich I´m attached. I can type all the above letters on the command line just fine. 'File' says: xmlread-4.pl: Perl script, UTF-8 Unicode text executable pout: UTF-8 Unicode text When I load pout into emacs, the ß shows up correctly. When I 'cat pout', the ß is displayed correctly in the terminal. So which character encoding on STDOUT does perl use by default? That should be utf-8 without any further ado, shouldn´t it? When I add binmode STDOUT, ":encoding(utf-8)"; the characters are displayed correctly in the terminal. Why would perl use something else than utf-8 by default? #!/usr/bin/perl use strict; use feature 'say'; use XML::Simple; #warnings should come last to handle any registered warnings in previous modules use warnings; binmode STDOUT, ":encoding(UTF-8)"; my $xml = XML::Simple->new; my $data = $xml->XMLin("test.xml"); say $data->{Regaletikett_ausgeben}; On Thu, Jul 28, 2016 at 9:05 AM hw mailto:h...@gc-24.de>> wrote: Hi, I would like to read XML files which look like this: uuid:ee1bd852-37ee-4965-a097-50130cf6dac7 Infostand 5449000134264 groß 5449000134264 5449000134264 10.0 20 There is an Umlaut, ß, supposed to be at groß which is apparently impossible to read. The following program ... #!/usr/bin/perl use strict; use warnings; use feature 'say'; use XML::Simple; use Data::Dumper; my $xml = new XML::Simple; my $data = $xml->XMLin("test.xml"); open my $fh, ">", 'pout'; print $fh Dumper($data); close $fh; print Dumper($data); exit 0; ... gives me this output: $VAR1 = { 'Bezeichnung1' => {}, 'id' => 'build_Inventur_1469705446', 'Stationsnummer' => 'Infostand', 'meta' => { 'content' => 'text/html; charset=UTF-8', 'http-equiv' => 'content-type', 'instanceID' => 'uuid:ee1bd852-37ee-4965-a097-50130cf6dac7' }, 'Mitarbeiter_inv' => '5449000134264', 'Regaletikett_ausgeben' => "gro\x{df}", 'Erfassung' => { 'Artikelstapel' => { 'Menge' => '20', 'Preis' => '10.0', 'EAN_Artikel' => '5449000134264', 'Etikettentyp' => {} }, 'Artikel_erfassen' => {}, 'Lagerstaette' => '5449000134264'
Re: XML::Simple Umlaute
Data::Dumper is dumping the internal format. To ensure compatibility, it is using the \x{df} escape to represent LATIN SMALL LETTER SHARP S. To see it rendered as a character, just print it: #!/usr/bin/perl use strict; use feature 'say'; use XML::Simple; #warnings should come last to handle any registered warnings in previous modules use warnings; binmode STDOUT, ":encoding(UTF-8)"; my $xml = XML::Simple->new; my $data = $xml->XMLin("test.xml"); say $data->{Regaletikett_ausgeben}; On Thu, Jul 28, 2016 at 9:05 AM hw wrote: > > Hi, > > I would like to read XML files which look like this: > > > > >http-equiv="content-type" content="text/html; charset=UTF-8"> > uuid:ee1bd852-37ee-4965-a097-50130cf6dac7 > >Infostand >5449000134264 > >groß > > > 5449000134264 > >5449000134264 >10.0 >20 > > > > > > > There is an Umlaut, ß, supposed to be at > > > groß > > > > which is apparently impossible to read. The following program ... > > > #!/usr/bin/perl > > use strict; > use warnings; > > use feature 'say'; > > use XML::Simple; > use Data::Dumper; > > > my $xml = new XML::Simple; > my $data = $xml->XMLin("test.xml"); > > open my $fh, ">", 'pout'; > print $fh Dumper($data); > close $fh; > > print Dumper($data); > > > exit 0; > > > ... gives me this output: > > > $VAR1 = { >'Bezeichnung1' => {}, >'id' => 'build_Inventur_1469705446', >'Stationsnummer' => 'Infostand', >'meta' => { > 'content' => 'text/html; charset=UTF-8', > 'http-equiv' => 'content-type', > 'instanceID' => > 'uuid:ee1bd852-37ee-4965-a097-50130cf6dac7' >}, >'Mitarbeiter_inv' => '5449000134264', >'Regaletikett_ausgeben' => "gro\x{df}", >'Erfassung' => { > 'Artikelstapel' => { > 'Menge' => '20', > 'Preis' => '10.0', > 'EAN_Artikel' => > '5449000134264', > 'Etikettentyp' => {} >}, > 'Artikel_erfassen' => {}, > 'Lagerstaette' => '5449000134264' > } > }; > > > I´m not getting any better results when adding an encoding tag to the > XML file and when writing the Dumper output to a file. > > Is it impossible to use Umlaute in XML Files? > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >