Re: RFR: JDK-8153781 Issue in XMLScanner: EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping large DOCTYPE section with CRLF at wrong place

huizhe wang Tue, 19 Apr 2016 12:25:41 -0700


On 4/19/2016 6:54 AM, Langer, Christoph wrote:

Hi again,


I've updated the testcase to increase the probability of it really hitting the 
issue. The old version was prone to miss the issue in case something changed in 
the parser.

http://cr.openjdk.java.net/~clanger/webrevs/8153781.2/


Looks good overall.

Error msgs: you made an extra effort :-) normally, we don't need to doany translation. The I18n team has a revolving task that willperiodically pick up these changes.

License header: it is correct to update to the latest Apache Licenseformat. However, we would need to preserve the "reserved comment block"if we've not changed the code, which is the case for class such asHTTPInputSource.java.


Best,
Joe


Best regards
Christoph

-----Original Message-----
From: Langer, Christoph
Sent: Montag, 18. April 2016 23:04
To: 'huizhe wang' <huizhe.w...@oracle.com>
Cc: core-libs-dev@openjdk.java.net
Subject: RE: RFR: JDK-8153781 Issue in XMLScanner:
EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping
large DOCTYPE section with CRLF at wrong place

Hi Joe,

here is the updated webrev where I incorporated your suggestions:

http://cr.openjdk.java.net/~clanger/webrevs/8153781.1/

I also added a testcase. As for the message " DoctypedeclNotClosed ": I did it 
in
several languages but there are some languages where I don't have the
knowledge to create a localized string. Is there some process for this or would
we just be ok with reverting to the English text for that particular message?

Thanks in advance for your review :)

All the best
Christoph

-----Original Message-----
From: huizhe wang [mailto:huizhe.w...@oracle.com]
Sent: Dienstag, 12. April 2016 23:28
To: Langer, Christoph <christoph.lan...@sap.com>
Cc: core-libs-dev@openjdk.java.net
Subject: Re: RFR: JDK-8153781 Issue in XMLScanner:
EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping
large DOCTYPE section with CRLF at wrong place


On 4/12/2016 11:50 AM, Langer, Christoph wrote:

Hi Joe,

thanks as always for your suggestions and I'll try to work it in. It will

probably

take me a little while as I'm currently busy with another thing. I'll update my
webrev eventually and add a testcase, too.

Ok.

But one question: I understand that the fix in skipDTD will be sufficient to fix

the issue. Nevetheless, can you explain me why the change in scanData is not
beneficial or could even cause issues? There are several places in scanData
when further loads are done. But only at this point where there's exactly one
character after CRLF at the end of the buffer the method just returns without
further load. If it wouldn't leave here it seems to me as if it would continue
correctly and load more data. That would probably also be better in the sense
of performance I guess??

It's there to handle the situation where a surrogate pair got split in
between buffers.

-Joe

Thanks
Christoph

-----Original Message-----
From: huizhe wang [mailto:huizhe.w...@oracle.com]
Sent: Dienstag, 12. April 2016 19:54
To: Langer, Christoph <christoph.lan...@sap.com>
Cc: core-libs-dev@openjdk.java.net
Subject: Re: RFR: JDK-8153781 Issue in XMLScanner:
EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when

skipping

large DOCTYPE section with CRLF at wrong place

Also, EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET was a
wrong msg
id. It would be good to change that to DoctypedeclNotClosed and add a
message to XMLMessages.properties right before

DoctypedeclUnterminated,

sth. like the following:

DoctypedeclNotClosed = The document type declaration for root element
type \"{0}\" must be closed with '']''.

Thanks,
Joe

On 4/11/2016 5:49 PM, huizhe wang wrote:

On 4/7/2016 1:45 PM, Langer, Christoph wrote:

Hi,



I've run into a peculiar issue with Xerces.



The problem is happening when a DTD shall be skipped, the DTD is
larger than the buffer of the current entity and a CRLF sequence
occurs just one char before the buffer end.



The reason is that method skipDTD of class XMLDTDScannerImpl (about
line 389) calls XMLEntityScanner.scanData() to scan for the next
occurrence of ']'. The scanData method might return true which
indicates that the delimiter ']' was not found yet and more data is
to scan. Other users of scanData would handle this and call this
method in a loop until it returns false or some other condition
happens. So I've fixed that place like at the other callers of scanData.

This part of the change looks good.


Nevertheless, the scanData method would usually load more data when
it is at the end of the buffer. But in the special case when CRLF is
found at the end of buffer - 1, scanData would just return true. So I
also removed that check at line 1374 in XMLEntityScanner. Do you see
any problem with that? Is there any reason behind it which I'm
overseeing?

No need to remove this after the above change. The parser needs to
retain what's in the xml, e.g., not removing new lines.

Furthermore I took the chance for further little cleanups. I've added
the new copyright header to the files... is that the correct one?

Yes, that's the right license header. However,

I also aligned the calls to invokeListeners(position) in
XMLEntityScanner to always pass the actual position from which the
load is started. Do you think this is correct?

Yes.


Here is the bug:

https://bugs.openjdk.java.net/browse/JDK-8153781



Here is the webrev:

http://cr.openjdk.java.net/~clanger/webrevs/8153781.0/



Please give me some comments before I finalize my change including a
jtreg testcase.

It would be better if you had included the testcase so that the review
can be done together with the code change.

Thanks,
Joe


Thanks & Best regards

Christoph

Re: RFR: JDK-8153781 Issue in XMLScanner: EXPECTED_SQUARE_BRACKET_TO_CLOSE_INTERNAL_SUBSET when skipping large DOCTYPE section with CRLF at wrong place

Reply via email to