Hello List - 

Just wanted some advice fixing an issue with JMX file encoding that has been 
affecting me for some time (extended thread below). 

Some test scripts written way back in JMeter 1.9 posted extended ASCII 
characters (including 'ý' and 'ü') to a web server. 

When migrating to JMeter 2.1.1, the scripts broke - JMeter would now send 
incorrect data to the server. With Sebb's help, I found setting the environment 
LANG variable to 'en_AU' had them work properly again. 

When migrating to JMeter 2.3.1, this workaround no longer worked. I then 
changed the following property in <JMeter>/bin/saveservice.properties from 
UTF-8 to ISO-8859-1: 
--------------------------------
# Character set encoding used to read and write JMeter XML files
#
_file_encoding=ISO-8859-1
--------------------------------

The scripts now work properly again. 

However, any new JMX scripts created now have this header: 
        <?xml version="1.0" encoding="ISO-8859-1"?>

Some of the scripts I created earlier have this in the header: 
        <?xml version="1.0" encoding="UTF-8"?>

Some scripts (the oldest) have no XML headers at all. 

The 'UTF-8' scripts work fine (even with '_file_encoding=ISO-8859-1') -- 
probably since those tests only handle 'normal' ASCII data (no extended 
characters).

For uniformity, and flexibility in future testing I wanted to bulk-convert all 
existing JMeter scripts to use the 'UTF-8' encoding only. Here's what I plan to 
do:

THE CUNNING PLAN! 
==================

1.      Convert all JMX script to UTF-8
I plan to use the following Perl one-liner as a Perl UTF-8 conversion utility:
--------------------------------
[ After installing the 'Unicode::MapUTF8' CPAN module ]

cat ISO-8859-1.jmx | perl -lne ' use Unicode::MapUTF8 qw(to_utf8); print 
to_utf8({ -string => $_, -charset => "ISO-8859-1" });' > UTF-8.jmx
--------------------------------

2.      Change the _file_encoding property back to UTF-8

3.      Manually change any 'ISO-8859-1' XML headers in JMX scripts to 'UTF-8'

==================



Does anyone forsee any problems/ have any advice? 


Regards,
Sonam Chauhan
-- 
Corporate Express Australia Ltd. 
Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
-----Original Message-----
From: Sonam Chauhan 
Sent: Thursday, 25 October 2007 4:57 PM
To: 'JMeter Users List'
Subject: RE: extended ASCII character handling in JMeter/Java

Thanks Sebb. You had asked: 

> > ------------------------
> > ý ---> %C3%BD
> > ü ---> %C3%BC
> > ------------------------
>
> Are these the correct conversions?

I don't really know. According to this page:
        http://www.albionresearch.com/misc/urlencode.php
... _these_ are the correct conversions: 
        ý ---> %FD
        ü ---> %FC

A Java application we use (webMethods), declare these conversions are 
interchangeable. It decodes both %FD and %C3%BD as ý.

However the web page above decodes %FD to ý ('y' with an aigu accent), but 
decodes %C3%BD as ý ('A' with a tilde accent and a "1/2" sign). Perhaps this 
is a UTF-8 multi-byte issue? I wish I had a UTF-8 editor that showed the hex 
representation of what one typed.

Kind regards,
Sonam Chauhan
-- 
Corporate Express Australia Ltd. 
Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
-----Original Message-----
From: sebb [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, 24 October 2007 9:22 PM
To: JMeter Users List
Subject: Re: extended ASCII character handling in JMeter/Java

On 24/10/2007, Sonam Chauhan <[EMAIL PROTECTED]> wrote:
> Thanks Sebb - we're still using JMeter 2.1.1 (our test harness is source 
> controlled as well). I'll push for an update to 2.3.
>
>
> [ All the info below is from 2.1.1. ]
>
> 2.1.1 does not seem to have a file encoding property in 
> saveservice.properties.

It was added later.

> The post data is setup through parameters.

OK.

> I don't know how to obtain info about the HTTP request encoding used. 
> However, if you meant the 'Encode?' tickbox setting in the HTTP Sampler, it's 
> ticked -- hence data posted should be URL-encoded.

The encoding field is another new field.

> Here's how the special character data is stored in the .jmx (copy/pasted by 
> opening the JMX in Windows XP Notepad ... hope it's not mangled):
> ------------------------
> <jmeterTestPlan version="1.1" properties="1.2">
>        ...
>   <stringProp name="Argument.value">... ýNNüV... </stringProp>
> ------------------------
>
> Here's how these characters are represented in the actual POST URL (viewed 
> through 'View Results Tree')
> ------------------------
> ý ---> %C3%BD
> ü ---> %C3%BC
> ------------------------

Are these the correct conversions?

> I couldn't find the value for file.encoding in jmeter.log. However, I did get 
> this on both Windows and AIX:

OK, that's a new log item - but one can get the value using the __P()
function (or a trivial Java program).

> ------------------------
> 2007/10/24 11:02:14 INFO  - jmeter.samplers.SampleResult: 
> sampleresult.default.encoding is set to ISO-8859-1
> ------------------------
> This occurs when I set LANG to either en_AU or en_AU.utf8.

Yes, that is a property you can set to override the default sample
result encoding - which is used if the response content-type does not
specify an encoding (charset). If not set, it defaults to ISO-8859-1,
which is what the log message is showing.

> Kind regards,
> Sonam Chauhan
> --
> Corporate Express Australia Ltd.
> Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
> -----Original Message-----
> From: sebb [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, 23 October 2007 10:01 PM
> To: JMeter Users List
> Subject: Re: extended ASCII character handling in JMeter/Java
>
> Which version of JMeter?
>
> There have been quite a lot of recent changes in the handling of JMX files.
>
> They now use UTF-8 by default - _file_encoding property in
> saveservice.properties.
>
> Originally JMeter used the platform default encoding, which would tend
> to cause problems when the JMX file are used on different systems.
>
> This was addressed in bug 36755, which was fixed in 2.3RC3.
>
> There have also been various other encoding fixes - see the changes file.
>
> How are you setting up the POST data?
> Using files, or parameters?
> What encoding are you using for the HTTP request?
>
> What is the value of the Java property "file.encoding" on the various systems?
> This is now shown in the jmeter log file.
>
> On 23/10/2007, Sonam Chauhan <[EMAIL PROTECTED]> wrote:
> > Hello  - I got hit by an old issue again this year, so wanted to ask about 
> > JMeter/Java handling of extended ASCII characters.
> >
> > I have some testcase that use extended ASCII characters 252 and 255 ('ı' 
> > and 'ü') as record separators in text data posted by the JMeter HTTP 
> > sampler. The testcases were created on Windows XP - the data was simply 
> > copy/pasted into the JMeter GUI.
> >
> > When these tests ran on Linux, I found the LANG environment variable had to 
> > be set as follows to make the tests work  (email below from last year)
> > LANG=en_AU
> > This year, I moved to AIX this year and the tests failed again - the cause 
> > was 'ı' and 'ü' characters in the data were now posted in as a "?". I found 
> > the LANG variable on AIX was 'en_AU.utf8'. When set back to 'en_AU', the 
> > tests began posting in the correct values.
> >
> > My question is what is causing this behavior - does Java or JMeter use data 
> > in JMX files differently depending on the LANG variable in UNIX?
> >
> > Kind regards,
> > Sonam Chauhan
> > --
> > Corporate Express Australia Ltd.
> > Phone: +61-2-93350725, Email: [EMAIL PROTECTED]
> >
> > _____________________________________________
> > From: Sonam Chauhan
> > Sent: Wednesday, 20 December 2006 2:27 PM
> > To: 'JMeter Users List'
> > Subject: JMeter under cron
> >
> > Just a cautionary tale of running JMeter through a cron job on a Linux 
> > system.
> >
> > We have a JMeter-based regression-test suite at work. This has run nightly 
> > for several years as a cron job. Recently, we added tests that post in 
> > extended ASCII data (which has 'ı' and 'ü' record separators) which 
> > sometimes passed, and sometimes failed. After much debugging I found the 
> > new tests failed when automatically run by cron, but passed when run by an 
> > interactive terminal session.
> >
> > When executed in an interactive terminal session, LANG is set to:
> >        LANG=en_AU
> > However, cron sets the Unix LANG environment variable to POSIX. Ie:
> > LANG=POSIX
> > This seems to be causing the proble,.
> >
> > I got the tests running by prefixing the test suite crontab entry with 
> > "export LANG=en_AU ;"
> > ie: The entry is now:
> > 30 20 * * * export LANG=en_AU ; $HOME/runsuite.sh >> $HOME/tmp.out 2>&1
> > This got these tests running.
> >
> > Regards,
> > Sonam Chauhan
> >
> > PS: 'locale -a' on the system shows that UTF-8 encoded English is also a 
> > support LANG attribute:
> > en_AU.utf8
> > I guess this may be more pertinent for those whose testcases post in binary 
> > data.
> >
> >
> >
>

The information contained in this email and any attached files are strictly
private and confidential. This email should be read by the intended addressee
only.  If the recipient of this message is not the intended addressee, please
call Corporate Express Australia Limited on +61 2 9335 0555 or Corporate Express
New Zealand Limited on +64 9 279 2555 and promptly delete this email and any
attachments.  The intended recipient of this email may only use, reproduce,
disclose or distribute the information contained in this email and any attached
files with Corporate Express' permission. If you are not the intended addressee,
you are strictly prohibited from using, reproducing, disclosing or distributing
the information contained in this email and any attached files.  Corporate
Express advises that this email and any attached files should be scanned to
detect viruses. Corporate Express accepts no liability for loss or damage
(whether caused by negligence or not) resulting from the use of any attached
files.

Reply via email to