The problem is, headers still have the original encoding. For example, say the site example.org had the encoding GBK. First we copy over the response headers, so our HttpResponse also has Content-Type: text/html; charset=GBK
Now, we send this response for rewriting, during which the dom is parsed and then serialized. Now once it is serialized, the byte representation of it is in Utf8. But the original content type header is still present. So now the poor browser is confused and gives up by showing bad unicode chars. On Thu, Jul 22, 2010 at 1:58 AM, John Hjelmstad <fa...@google.com> wrote: > AFAIK this is a feature, not a bug. It standardizes output as UTF8, which > should be able to represent any character data. > > On Wed, Jul 21, 2010 at 12:36 PM, Gagandeep Singh (JIRA) <j...@apache.org > >wrote: > > > MutableContent causing lossy content encoding > > --------------------------------------------- > > > > Key: SHINDIG-1395 > > URL: https://issues.apache.org/jira/browse/SHINDIG-1395 > > Project: Shindig > > Issue Type: Bug > > Components: Java > > Reporter: Gagandeep Singh > > Assignee: Gagandeep Singh > > Priority: Critical > > > > > > MutableContent.getRawContentBytes and MutableContent.getContent are buggy > > because they serialize the Document into a utf8 string disregarding the > > original encoding of the page that is known to the HttpResponse object. > > > > Here is how it goes wrong for accel servlet: > > > > AccelServlet.doFetch -> > > DefaulltResponseRewriterRegistry.rewriteHttpResponse -> > > HttpResponseBUilder.create -> > > new HttpResponse -> > > HttpResponseBuilder.getResponse -> > > MutableContent.getRawContentBytes() > > > > NOTE: This could also be problem with gadgets. Need to verify. > > > > -- > > This message is automatically generated by JIRA. > > - > > You can reply to this email to add a comment to the issue online. > > > > >