Ah! That could well be it - thanks Mark! I’ll test this out.
Chris > On 29 Jan 2016, at 3:26 AM, Mark Hung <mark...@gmail.com> wrote: > > http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/ContentProperties.cxx#454 > > <http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/ContentProperties.cxx#454> > > - else if ( rName == "Content-Type" ) > + else if ( rName.equalsIgnoreAsciiCaseAscii("Content-Type")) > > > > 2016-01-28 9:16 GMT+08:00 Chris Sherlock <chris.sherloc...@gmail.com > <mailto:chris.sherloc...@gmail.com>>: > Hi guys, > > I’m afraid I’m still a bit stuck on this, any other ideas what might be > causing the problem? > > Chris > >> On 6 Jan 2016, at 4:27 AM, Chris Sherlock <chris.sherloc...@gmail.com >> <mailto:chris.sherloc...@gmail.com>> wrote: >> >> Thanks Mark, appreciate these code pointers! >> >> (I’m cc’ing in the mailing list so others can comment) >> >> Chris >> >>> On 4 Jan 2016, at 8:21 PM, Mark Hung <mark...@gmail.com >>> <mailto:mark...@gmail.com>> wrote: >>> >>> >>> I meant there is a chance for SvParser::GetNextChar() to switch encoding, >>> but yes it is less relevant. >>> >>> Grepping content-type under ucb , there are some suspicious code >>> http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/ContentProperties.cxx#454 >>> >>> <http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav-neon/ContentProperties.cxx#454> >>> http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav/ContentProperties.cxx#471 >>> >>> <http://opengrok.libreoffice.org/xref/core/ucb/source/ucp/webdav/ContentProperties.cxx#471> >>> >>> Which seems incosistent with >>> http://opengrok.libreoffice.org/xref/core/sc/source/filter/html/htmlpars.cxx#264 >>> >>> <http://opengrok.libreoffice.org/xref/core/sc/source/filter/html/htmlpars.cxx#264> >>> >>> >>> 2016-01-04 16:17 GMT+08:00 Chris Sherlock <chris.sherloc...@gmail.com >>> <mailto:chris.sherloc...@gmail.com>>: >>> Hi Mark, >>> >>> BOM detection is irrelevant here. The HTTP header states that it should be >>> UTF8, but this is not being honoured. >>> >>> There is something further down the stack that isn’t recording the HTTP >>> headers. >>> >>> Chris >>> >>>> On 4 Jan 2016, at 4:23 PM, Mark Hung <mark...@gmail.com >>>> <mailto:mark...@gmail.com>> wrote: >>>> >>>> Hi Chris, >>>> >>>> As recently I'm working on SvParser and HTMLParser, >>>> >>>> There is BOM detection is in SvParser::GetNextChar(). >>>> >>>> A quick look at eehtml, EditHTMLParser:: <>EditHTMLParser seems relevant. >>>> >>>> Best regards. >>>> >>>> >>>> 2016-01-04 12:02 GMT+08:00 Chris Sherlock <chris.sherloc...@gmail.com >>>> <mailto:chris.sherloc...@gmail.com>>: >>>> Hey guys, >>>> >>>> Probably nobody saw this because of the time of year (Happy New Year, >>>> incidentally!!!). >>>> >>>> Just a quick ping to the list to see if anyone can give me some pointers. >>>> >>>> Chris >>>> >>>>> On 30 Dec 2015, at 12:15 PM, Chris Sherlock <chris.sherloc...@gmail.com >>>>> <mailto:chris.sherloc...@gmail.com>> wrote: >>>>> >>>>> Hi guys, >>>>> >>>>> In bug 95217 - https://bugs.documentfoundation.org/show_bug.cgi?id=95217 >>>>> <https://bugs.documentfoundation.org/show_bug.cgi?id=95217> - Persian >>>>> test in a webpage encoded as UTF-8 is corrupting. >>>>> >>>>> If I take the webpage and save to an HTML file encoded as UTF8, then >>>>> there are no problems and the Persian text comes through fine. However, >>>>> when connecting to a webserver directly, the HTTP header correctly gives >>>>> the content type as utf8. >>>>> >>>>> I did a test using Charles Proxy with its SSL interception feature turned >>>>> on and pointed Safari to >>>>> https://bugs.documentfoundation.org/attachment.cgi?id=119818 >>>>> <https://bugs.documentfoundation.org/attachment.cgi?id=119818> >>>>> >>>>> The following headers are gathered: >>>>> >>>>> HTTP/1.1 200 OK >>>>> Server: nginx/1.2.1 >>>>> Date: Sat, 26 Dec 2015 01:41:30 GMT >>>>> Content-Type: text/html; name="text.html"; charset=UTF-8 >>>>> Content-Length: 982 >>>>> Connection: keep-alive >>>>> X-xss-protection: 1; mode=block >>>>> Content-disposition: inline; filename="text.html" >>>>> X-content-type-options: nosniff >>>>> >>>>> Some warnings are spat out that it editeng's eehtml can't detect the >>>>> encoding. I initially thought it was looking for a BOM, which makes no >>>>> sense for a webpage, but that's wrong. Instead, for some reason the >>>>> headers don't seem to be processed and the HTML parser is falling back to >>>>> ISO-8859-1 and not UTF8 as the character encoding. >>>>> >>>>> We seem to use Neon to make the GET request to the webserver. A few >>>>> observations: >>>>> >>>>> 1. We detect a server OK response as an error >>>>> 2. (Probably more to the point) I believe PROPFIND is being used, but >>>>> actually even though the function being used indicates a PROPFIND verb is >>>>> used a GET is used as is normal but the headers aren't being stored. This >>>>> ,Evans that when the parser looks for the headers to find the encoding >>>>> it's not finding anything, resulting in a fallback to ISO-8859-1. >>>>> >>>>> One easy thing (doesn't solve the root issue) is that wouldn't it be a >>>>> better idea to fallback to UTF8 and not ISO-8859-1, given ISO-8859-1 is >>>>> really just a subset of UTF-8? >>>>> >>>>> Any pointers on how to get to the bottom of this would be appreciated, >>>>> I'm honestly not up on webdav or Neon. >>>>> >>>>> Chris Sherlock >>>> >>>> >>>> _______________________________________________ >>>> LibreOffice mailing list >>>> LibreOffice@lists.freedesktop.org >>>> <mailto:LibreOffice@lists.freedesktop.org> >>>> http://lists.freedesktop.org/mailman/listinfo/libreoffice >>>> <http://lists.freedesktop.org/mailman/listinfo/libreoffice> >>>> >>>> >>>> >>>> >>>> -- >>>> Mark Hung >>> >>> >>> >>> >>> -- >>> Mark Hung >> > > > _______________________________________________ > LibreOffice mailing list > LibreOffice@lists.freedesktop.org <mailto:LibreOffice@lists.freedesktop.org> > http://lists.freedesktop.org/mailman/listinfo/libreoffice > <http://lists.freedesktop.org/mailman/listinfo/libreoffice> > > > > > -- > Mark Hung
_______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice