If I understand you correctly you suggest to use IRequestCycleSettings#getResponseRequestEncoding() for: - String.getBytes(HERE) - new InputStreamReader(stream, HERE) - this is in XmlReader - (the XML prolog maybe be ignored totally)
I think this should work. On Thu, Jun 7, 2012 at 10:14 AM, Juergen Donnerstag <[email protected]> wrote: > And there is no stable solution, except we create an artificial one > (via XML prolog and encoding parameter), since each OS and each > country charset default is different. We could ease creating that > though for testing purposes. E.g. allowing for a test specific > default, different than the OS default, for the XML prolog and the > encoding parameters. > > Juergen > > On Wed, Jun 6, 2012 at 12:32 PM, Martin Grigorov <[email protected]> wrote: >> Hi Juergen, >> >> Thanks for the explanation! >> >> I've tried all combinations of the following variables: >> - -Dfile.encoding=latin1 >> - with and without <?xml encoding="utf-8"?> in the String to parse >> - parse(new ByteArrayInputStream(string.toString().getBytes("UTF-8")), null); >> - parse(new ByteArrayInputStream(string.toString().getBytes()), "UTF-8"); >> - parse(new ByteArrayInputStream(string.toString().getBytes("UTF-8")), >> "UTF-8"); >> >> and the test passes only when the String has the prolog with the >> encoding and "parse(new >> ByteArrayInputStream(string.toString().getBytes("UTF-8")), "UTF-8");" >> is used >> any other combination produces mangled characters and the assertion fails >> >> So I cannot find a stable solution that will work on any environment. >> We can use IRequestCycleSettings#getResponseRequestEncoding() for the >> charset but if there is no XML prolog or it has no encoding attr then >> the test fails. >> >> On Tue, Jun 5, 2012 at 11:53 PM, Juergen Donnerstag >> <[email protected]> wrote: >>> Hi Martin, >>> >>> XmlReader reads the markup file, interprets <?xml encoding ..> if >>> present, and converts the markup into a String, which in Java is >>> always UTF encoded. XmlPullParser uses the data provided by XmlReader. >>> >>> To support unit testing XPP provide a parse(String) method which >>> encapsulates the string into a inputstream, in order not to circumvent >>> XmlReader for testing. >>> >>> No xml decl (or no encoding) results in XmlReader using the JVM >>> default, which if the OS default not provided via -Dfile.encoding= >>> >>> And since you never know on which OS in which country devs a building >>> or testing, providing the UTF encoded value is the save way of doing >>> it. >>> >>> We may replace parse(string) with parse(string, "encoding") which >>> seems to be supported by all underlying methods, but are preset with >>> null (JVM default) right now. That may help you solve your problem, >>> and make other devs aware that the encoding might need change. >>> >>> make sense? >>> >>> Juergen >>> >>> On Tue, Jun 5, 2012 at 9:54 AM, Juergen Donnerstag >>> <[email protected]> wrote: >>>> I'll have a look later today. >>>> >>>> Juergen >>>> >>>> On Mon, Jun 4, 2012 at 3:37 PM, Martin Grigorov >>>> <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> I'm not quite sure but I think there is a bug in >>>>> org.apache.wicket.markup.parser.XmlPullParser#parse(CharSequence) >>>>> because it uses >>>>> string.toString().getBytes() to create a ByteArrayInputStream. >>>>> >>>>> org.apache.wicket.util.tester.BaseWicketTester#getTagById(String) uses >>>>> lastResponseAsString to feed XmlPullParser but lastResponseAsString's >>>>> encoding depends on >>>>> org.apache.wicket.settings.IRequestCycleSettings#getResponseRequestEncoding(). >>>>> I.e. the string may be encoded in UTF-8 but later XmlPullParser will >>>>> try to process its bytes as Windows-1252 for example. >>>>> >>>>> >>>>> Here is a small patch that exposes the problem: >>>>> diff --git >>>>> a/wicket-core/src/test/java/org/apache/wicket/markup/parser/XmlPullParserTest.java >>>>> b/wicket-core/src/test/java/org/apache/wicket/markup/p >>>>> index 2e26d05..15fb496 100644 >>>>> --- >>>>> a/wicket-core/src/test/java/org/apache/wicket/markup/parser/XmlPullParserTest.java >>>>> +++ >>>>> b/wicket-core/src/test/java/org/apache/wicket/markup/parser/XmlPullParserTest.java >>>>> @@ -191,6 +191,13 @@ public class XmlPullParserTest extends Assert >>>>> assertNull(parser.getEncoding()); >>>>> tag = parser.nextTag(); >>>>> assertNull(tag); >>>>> + >>>>> + String expected = "äöü€"; >>>>> + parser.parse("<dummy>"+expected+"</dummy>"); >>>>> + XmlTag openTag = parser.nextTag(); >>>>> + XmlTag closeTag = parser.nextTag(); >>>>> + String actual = parser.getInput(openTag.getPos() + >>>>> openTag.getLength(), closeTag.getPos()).toString(); >>>>> + assertEquals(expected, actual); >>>>> } >>>>> >>>>> /** >>>>> >>>>> Apply this patch and run the test with -Dfile.encoding=latin1. It will >>>>> fail in the comparison. Run it with UTF-8 and it will pass. >>>>> >>>>> I remember Juergen had similar problem with one of Wicket core tests >>>>> that uses the Euro sign in an assertion and he fixed it by using >>>>> unicode escaped value (\uabcd). >>>>> But in this case the response is encoded with whatever is configured >>>>> at IRequestCycleSettings#getResponseRequestEncoding() and >>>>> XmlPullParser tries to read it with the platform default encoding. >>>>> >>>>> Is this a bug and how we can solve it ? >>>>> >>>>> -- >>>>> Martin Grigorov >>>>> jWeekend >>>>> Training, Consulting, Development >>>>> http://jWeekend.com >> >> >> >> -- >> Martin Grigorov >> jWeekend >> Training, Consulting, Development >> http://jWeekend.com -- Martin Grigorov jWeekend Training, Consulting, Development http://jWeekend.com
