Re: XML::Simple Umlaute

2016-08-09 Thread Chas. Owens
Take a look at the -C argument for perl and the PERL_UNICODE environment
variable in http://perldoc.perl.org/perlrun.html

Examine the difference between

perl -E 'say "\x{df}"'

and

PERL_UNICODE=O perl -E 'say "\x{df}"'

That said, if you are working with the web, why in the world are you
sending UTF-8?  HTML has entities for a reason.  I would suggest using
HTML::Entities instead of trying to send non-ASCII characters through who
knows how many layers of things that can screw up UTF-8:

perl -MHTML::Entities -E 'say encode_entities "\x{df}"'



On Tue, Aug 9, 2016 at 7:34 AM hw  wrote:

> Chas. Owens schrieb:
> >
> > On Thu, Jul 28, 2016 at 10:55 AM Paul Johnson  p...@pjcj.net>> wrote:
> >
> > On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote:
> >
> > snip
> >
> >  > Also, this answer on StackOverflow by tchrist (Tom Christiansen,
> who I
> >  > would say knows the most about the intersection of Perl and
> Unicode)
> >  > is a good resource: http://stackoverflow.com/a/6163129/78259
> >
> > Quite.  And utf8::all tries to encapsulate as much of that
> boilerplate
> > as it can.
> >
> >
> > I have always read that answer as a bit of an indictment of the idea of
> "you should be able to load this module and everything will be fine".
> Unicode is complex and trying to treat it like just another list of
> characters is doomed to teeth gnashing and crying.  Of course, even
> treating it the way it should be leads to teeth gnashing and crying, but at
> least that will be over the fact the humans suck (we can't even agree on
> where þ should be sorted) as opposed to Perl sucking.
>
> When I have something like
>
>
> print $cgi->p('Gebäudefläche:');
>
>
> in my source, which is correctly displayed everywhere else, I also
> need it correctly displayed in the web browser --- even particularly
> there because that is what the users are looking at.
>
> And that´s all there is to it.  It´s really that simple.
>
>


Re: XML::Simple Umlaute

2016-08-09 Thread hw

Chas. Owens schrieb:

On Thu, Jul 28, 2016 at 10:05 AM, hw  wrote:
snip

So which character encoding on STDOUT does perl use by default?  That should
be utf-8 without any further ado, shouldn´t it?  When I add


binmode STDOUT, ":encoding(utf-8)";


the characters are displayed correctly in the terminal.  Why would perl use
something else than utf-8 by default?


Take the following with a grain of salt.  My knowledge is mostly
hearsay and supposition with a dash of cargo cultism on this matter.

Perl predates even Unicode (they both came out in '87).  Unicode did
not get much traction until the mid-nineties when people started
realizing that UTF-8 (created in '92) was a good thing.   So, for most
of its early history, Perl used Latin1.  It still does to a large
extent for backwards compatibility reasons.  To make Perl 5 a proper
UTF-8 environment there are a number of knobs to pull and buttons to
poke.

You may find this video from YAPC NA 2016 enlightening:
https://www.youtube.com/watch?v=TmTeXcEixEg

Others that may be helpful (I haven't watched them, but I trust the speaker):

https://www.youtube.com/watch?v=iZgqhVu72zc
https://www.youtube.com/watch?v=X2FQHUHjo8M

Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I
would say knows the most about the intersection of Perl and Unicode)
is a good resource: http://stackoverflow.com/a/6163129/78259

Hope this helps.


Thanks!  That makes it really complicated to write applications which display
data from a database via a web browser --- yet ppl are doing this since a pretty
long time now.  But no matter what I do, Umlaute are not displayed correctly
throughout the whole web page: they are either wrong in the data from the
database or in print statements or in the output of the CGI::FormBuilder.
There´s probably no way to get it right :(


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: XML::Simple Umlaute

2016-08-09 Thread hw

Chas. Owens schrieb:


On Thu, Jul 28, 2016 at 10:55 AM Paul Johnson mailto:p...@pjcj.net>> wrote:

On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote:

snip

 > Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I
 > would say knows the most about the intersection of Perl and Unicode)
 > is a good resource: http://stackoverflow.com/a/6163129/78259

Quite.  And utf8::all tries to encapsulate as much of that boilerplate
as it can.


I have always read that answer as a bit of an indictment of the idea of "you should 
be able to load this module and everything will be fine".  Unicode is complex and 
trying to treat it like just another list of characters is doomed to teeth gnashing and 
crying.  Of course, even treating it the way it should be leads to teeth gnashing and 
crying, but at least that will be over the fact the humans suck (we can't even agree on 
where þ should be sorted) as opposed to Perl sucking.


When I have something like


print $cgi->p('Gebäudefläche:');


in my source, which is correctly displayed everywhere else, I also
need it correctly displayed in the web browser --- even particularly
there because that is what the users are looking at.

And that´s all there is to it.  It´s really that simple.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: XML::Simple Umlaute

2016-08-09 Thread hw

Paul Johnson schrieb:

On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote:

On Thu, Jul 28, 2016 at 10:05 AM, hw  wrote:
snip

So which character encoding on STDOUT does perl use by default?  That should
be utf-8 without any further ado, shouldn´t it?  When I add


binmode STDOUT, ":encoding(utf-8)";


the characters are displayed correctly in the terminal.  Why would perl use
something else than utf-8 by default?


As a general rule, use "utf8::all" instead of just "utf8" and a lot of
the problems go away.


Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I
would say knows the most about the intersection of Perl and Unicode)
is a good resource: http://stackoverflow.com/a/6163129/78259


Quite.  And utf8::all tries to encapsulate as much of that boilerplate
as it can.



Maybe that would work, but I can´t very well go through all the programs
and adjust them and experiment every time there is a problem like this.
I need some sort of general switch to make perl use utf8 by default, as
it should to begin with ...


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: XML::Simple Umlaute

2016-07-30 Thread Mike Flannigan


I'm not sure if it is possible to use Umlaute in XML Files
or not.  Maybe this post with help you:
http://stackoverflow.com/questions/11772468/reading-xml-files-with-umlaut-chars

Is there a way to change encoding to "iso-8859-1"?


Mike


On 7/28/2016 8:03 AM, beginners-digest-h...@perl.org wrote:







Hi,

I would like to read XML files which look like this:




  
uuid:ee1bd852-37ee-4965-a097-50130cf6dac7
  
  Infostand
  5449000134264
  
groß
  

5449000134264

  5449000134264
  10.0
  20
  

  



There is an Umlaut, ß, supposed to be at


groß



which is apparently impossible to read.  The following program ...


#!/usr/bin/perl

use strict;
use warnings;

use feature 'say';

use XML::Simple;
use Data::Dumper;


my $xml = new XML::Simple;
my $data = $xml->XMLin("test.xml");

open my $fh, ">", 'pout';
print $fh Dumper($data);
close $fh;

print Dumper($data);


exit 0;


... gives me this output:


$VAR1 = {
  'Bezeichnung1' => {},
  'id' => 'build_Inventur_1469705446',
  'Stationsnummer' => 'Infostand',
  'meta' => {
'content' => 'text/html; charset=UTF-8',
'http-equiv' => 'content-type',
'instanceID' => 
'uuid:ee1bd852-37ee-4965-a097-50130cf6dac7'

  },
  'Mitarbeiter_inv' => '5449000134264',
  'Regaletikett_ausgeben' => "gro\x{df}",
  'Erfassung' => {
 'Artikelstapel' => {
'Menge' => '20',
'Preis' => '10.0',
'EAN_Artikel' => 
'5449000134264',

'Etikettentyp' => {}
  },
 'Artikel_erfassen' => {},
 'Lagerstaette' => '5449000134264'
   }
};


I´m not getting any better results when adding an encoding tag to the
XML file and when writing the Dumper output to a file.

Is it impossible to use Umlaute in XML Files?




Re: XML::Simple Umlaute

2016-07-28 Thread Chas. Owens
On Thu, Jul 28, 2016 at 10:55 AM Paul Johnson  wrote:

> On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote:

 snip

> > Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I
> > would say knows the most about the intersection of Perl and Unicode)
> > is a good resource: http://stackoverflow.com/a/6163129/78259
>
> Quite.  And utf8::all tries to encapsulate as much of that boilerplate
> as it can.
>

I have always read that answer as a bit of an indictment of the idea of
"you should be able to load this module and everything will be fine".
Unicode is complex and trying to treat it like just another list of
characters is doomed to teeth gnashing and crying.  Of course, even
treating it the way it should be leads to teeth gnashing and crying, but at
least that will be over the fact the humans suck (we can't even agree on
where þ should be sorted) as opposed to Perl sucking.


Re: XML::Simple Umlaute

2016-07-28 Thread Paul Johnson
On Thu, Jul 28, 2016 at 10:23:19AM -0400, Chas. Owens wrote:
> On Thu, Jul 28, 2016 at 10:05 AM, hw  wrote:
> snip
> > So which character encoding on STDOUT does perl use by default?  That should
> > be utf-8 without any further ado, shouldn´t it?  When I add
> >
> >
> > binmode STDOUT, ":encoding(utf-8)";
> >
> >
> > the characters are displayed correctly in the terminal.  Why would perl use
> > something else than utf-8 by default?

As a general rule, use "utf8::all" instead of just "utf8" and a lot of
the problems go away.

> Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I
> would say knows the most about the intersection of Perl and Unicode)
> is a good resource: http://stackoverflow.com/a/6163129/78259

Quite.  And utf8::all tries to encapsulate as much of that boilerplate
as it can.

-- 
Paul Johnson - p...@pjcj.net
http://www.pjcj.net

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: XML::Simple Umlaute

2016-07-28 Thread Chas. Owens
On Thu, Jul 28, 2016 at 10:05 AM, hw  wrote:
snip
> So which character encoding on STDOUT does perl use by default?  That should
> be utf-8 without any further ado, shouldn´t it?  When I add
>
>
> binmode STDOUT, ":encoding(utf-8)";
>
>
> the characters are displayed correctly in the terminal.  Why would perl use
> something else than utf-8 by default?

Take the following with a grain of salt.  My knowledge is mostly
hearsay and supposition with a dash of cargo cultism on this matter.

Perl predates even Unicode (they both came out in '87).  Unicode did
not get much traction until the mid-nineties when people started
realizing that UTF-8 (created in '92) was a good thing.   So, for most
of its early history, Perl used Latin1.  It still does to a large
extent for backwards compatibility reasons.  To make Perl 5 a proper
UTF-8 environment there are a number of knobs to pull and buttons to
poke.

You may find this video from YAPC NA 2016 enlightening:
https://www.youtube.com/watch?v=TmTeXcEixEg

Others that may be helpful (I haven't watched them, but I trust the speaker):

https://www.youtube.com/watch?v=iZgqhVu72zc
https://www.youtube.com/watch?v=X2FQHUHjo8M

Also, this answer on StackOverflow by tchrist (Tom Christiansen, who I
would say knows the most about the intersection of Perl and Unicode)
is a good resource: http://stackoverflow.com/a/6163129/78259

Hope this helps.


-- 
Chas. Owens
http://github.com/cowens
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: XML::Simple Umlaute

2016-07-28 Thread hw

Chas. Owens schrieb:

Data::Dumper is dumping the internal format.  To ensure compatibility, it is 
using the \x{df} escape to represent LATIN SMALL LETTER SHARP S. To see it 
rendered as a character, just print it:


Thanks!  That kinda works:


#!/usr/bin/perl

use strict;
use warnings;

use feature 'say';
use utf8;

use XML::Simple;
use Data::Dumper;


my $xml = new XML::Simple;
my $data = $xml->XMLin("test.xml");

open my $fh, ">", 'pout';
binmode $fh, ":encoding(utf-8)";

print $fh Dumper($data);

print Dumper($data);
print $fh $data->{'Regaletikett_ausgeben'};
close $fh;


if($data->{'Regaletikett_ausgeben'} eq 'groß') {
  say 'ist groß';
} else {
  say 'nicht groß';
}

say 'ok';

say 'test-1: äöüÄÖÜß';
say "test-2: äöüÄÖÜß";
print "test-3: äöüÄÖÜß\n";


exit 0;


Output is:


$VAR1 = {
  'Regaletikett_ausgeben' => "gro\x{df}",
  'Mitarbeiter_inv' => '5449000134264',
  'Bezeichnung1' => {},
  'Stationsnummer' => 'Infostand',
  'Erfassung' => {
 'Lagerstaette' => '5449000134264',
 'Artikel_erfassen' => {},
 'Artikelstapel' => {
'Etikettentyp' => {},
'EAN_Artikel' => '5449000134264',
'Menge' => '20',
'Preis' => '10.0'
  }
   },
  'meta' => {
'instanceID' => 'uuid:ee1bd852-37ee-4965-a097-50130cf6dac7',
'http-equiv' => 'content-type',
'content' => 'text/html; charset=UTF-8'
  },
  'id' => 'build_Inventur_1469705446'
};
ist gro
   ok
test-1: �
test-2: �
test-3: �


In case you can´t see it:  The test-printing shows a single unknown
character instead of äöüÄÖÜß.  Now 'env' says:


[...]
LANG=de_DE.utf8
[...]


I´m looking at an xterm window which is connected via ssh to a remote host
on which an instance of tmux is running to wich I´m attached.  I can type
all the above letters on the command line just fine.  'File' says:


xmlread-4.pl: Perl script, UTF-8 Unicode text executable
pout: UTF-8 Unicode text


When I load pout into emacs, the ß shows up correctly.  When I 'cat pout',
the ß is displayed correctly in the terminal.

So which character encoding on STDOUT does perl use by default?  That should
be utf-8 without any further ado, shouldn´t it?  When I add


binmode STDOUT, ":encoding(utf-8)";


the characters are displayed correctly in the terminal.  Why would perl use
something else than utf-8 by default?




#!/usr/bin/perl

use strict;
use feature 'say';

use XML::Simple;

#warnings should come last to handle any registered warnings in previous modules
use warnings;

binmode STDOUT, ":encoding(UTF-8)";

my $xml = XML::Simple->new;
my $data = $xml->XMLin("test.xml");

say $data->{Regaletikett_ausgeben};


On Thu, Jul 28, 2016 at 9:05 AM hw mailto:h...@gc-24.de>> wrote:


Hi,

I would like to read XML files which look like this:





  uuid:ee1bd852-37ee-4965-a097-50130cf6dac7

Infostand
5449000134264

groß

  
  5449000134264
  
5449000134264
10.0
20

  




There is an Umlaut, ß, supposed to be at


groß



which is apparently impossible to read.  The following program ...


#!/usr/bin/perl

use strict;
use warnings;

use feature 'say';

use XML::Simple;
use Data::Dumper;


my $xml = new XML::Simple;
my $data = $xml->XMLin("test.xml");

open my $fh, ">", 'pout';
print $fh Dumper($data);
close $fh;

print Dumper($data);


exit 0;


... gives me this output:


$VAR1 = {
'Bezeichnung1' => {},
'id' => 'build_Inventur_1469705446',
'Stationsnummer' => 'Infostand',
'meta' => {
  'content' => 'text/html; charset=UTF-8',
  'http-equiv' => 'content-type',
  'instanceID' => 
'uuid:ee1bd852-37ee-4965-a097-50130cf6dac7'
},
'Mitarbeiter_inv' => '5449000134264',
'Regaletikett_ausgeben' => "gro\x{df}",
'Erfassung' => {
   'Artikelstapel' => {
  'Menge' => '20',
  'Preis' => '10.0',
  'EAN_Artikel' => 
'5449000134264',
  'Etikettentyp' => {}
},
   'Artikel_erfassen' => {},
   'Lagerstaette' => '5449000134264'

Re: XML::Simple Umlaute

2016-07-28 Thread Chas. Owens
Data::Dumper is dumping the internal format.  To ensure compatibility, it
is using the \x{df} escape to represent LATIN SMALL LETTER SHARP S. To see
it rendered as a character, just print it:

#!/usr/bin/perl

use strict;
use feature 'say';

use XML::Simple;

#warnings should come last to handle any registered warnings in previous
modules
use warnings;

binmode STDOUT, ":encoding(UTF-8)";

my $xml = XML::Simple->new;
my $data = $xml->XMLin("test.xml");

say $data->{Regaletikett_ausgeben};


On Thu, Jul 28, 2016 at 9:05 AM hw  wrote:

>
> Hi,
>
> I would like to read XML files which look like this:
>
>
> 
> 
>http-equiv="content-type" content="text/html; charset=UTF-8">
>  uuid:ee1bd852-37ee-4965-a097-50130cf6dac7
>
>Infostand
>5449000134264
>
>groß
>
>  
>  5449000134264
>  
>5449000134264
>10.0
>20
>
>  
>
> 
>
>
> There is an Umlaut, ß, supposed to be at
>
>
> groß
>
>
>
> which is apparently impossible to read.  The following program ...
>
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use feature 'say';
>
> use XML::Simple;
> use Data::Dumper;
>
>
> my $xml = new XML::Simple;
> my $data = $xml->XMLin("test.xml");
>
> open my $fh, ">", 'pout';
> print $fh Dumper($data);
> close $fh;
>
> print Dumper($data);
>
>
> exit 0;
>
>
> ... gives me this output:
>
>
> $VAR1 = {
>'Bezeichnung1' => {},
>'id' => 'build_Inventur_1469705446',
>'Stationsnummer' => 'Infostand',
>'meta' => {
>  'content' => 'text/html; charset=UTF-8',
>  'http-equiv' => 'content-type',
>  'instanceID' =>
> 'uuid:ee1bd852-37ee-4965-a097-50130cf6dac7'
>},
>'Mitarbeiter_inv' => '5449000134264',
>'Regaletikett_ausgeben' => "gro\x{df}",
>'Erfassung' => {
>   'Artikelstapel' => {
>  'Menge' => '20',
>  'Preis' => '10.0',
>  'EAN_Artikel' =>
> '5449000134264',
>  'Etikettentyp' => {}
>},
>   'Artikel_erfassen' => {},
>   'Lagerstaette' => '5449000134264'
> }
>  };
>
>
> I´m not getting any better results when adding an encoding tag to the
> XML file and when writing the Dumper output to a file.
>
> Is it impossible to use Umlaute in XML Files?
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>