https://issues.apache.org/bugzilla/show_bug.cgi?id=51645

             Bug #: 51645
           Summary: CSVDataSet does not read UTF-8 files when
                    file.encoding is UTF-8
           Product: JMeter
           Version: 2.4
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: Main
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


Created attachment 27366
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27366
Patch to fix issue.  Variable not renamed to show just a matter of replacing
class.

CSV Data Sets which are encoded in UTF-8 do not work on platforms where the
default file.encoding is UTF-8.

UTF-8 is used to illustrate here, but this would presumably apply to other
non-8bit character sets as well.

Reason:  The use of ByteArrayOutputStream in the CSVSaveService.csvReadFile()
method.   Specifically, the boas.write(ch) call is implemented (internally in
ByteArrayOutputStream) with a cast to the byte primitive type ( buf[count] =
(byte)b; in my JVM).

Later, the ByteArrayOutputStream is interpreted according to the platform
default (via baos.toString()) and if the content of the array are then
interpreted according to the platform's default char set.  If that charset (eg.
ISO-8859-1) is 8-bit, everything is fine.  However, unpredictable
results/unmapped chars result for other charsets (like UTF-8).

For example, the character \u0027 (LATIN SMALL LETTER C WITH CEDILLA) with
decimal code point 231.  When put into boas, it becomes (7 bit signed) -25. 
When converted via toString() with UTF-8 as the default char set, the value is
not recognized as a valid code point and the value \ufffd (decimal code point
65533 == Unicodes "REPLACEMENT CHARACTER" placeholder) is placed in the return
string instead.

Fix: patche attached. Simply replace ByteArrayOutputStream with CharArrayWriter
and the UTF-8 files work regardless of the value for file.encoding.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to