gstein 99/05/25 03:03:39
Added: src/lib/expat MPL-1_0.html Makefile.tmpl asciitab.h expat.html gplelect.html hashtable.c hashtable.h iasciitab.h latin1tab.h nametab.h utf8tab.h xmlparse.c xmlparse.h xmlrole.c xmlrole.h xmltok.c xmltok.h xmltok_impl.c xmltok_impl.h Log: Adding Expat 1.0.2 to the Apache repository. Apache elects to use the MPL license, per the instructions in expat.html. This version of Expat 1.0.2 has been modified to contain only the tokenizer and parser, and the Makefile has been adjusted to use Apache conventions (Makefile.tmpl). See http://www.jclark.com/xml/expat.html for more information on Expat. Revision Changes Path 1.1 apache-1.3/src/lib/expat/MPL-1_0.html Index: MPL-1_0.html =================================================================== <TITLE>Mozilla Public License version 1.0</TITLE> <BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000"> <P ALIGN=CENTER> <FONT SIZE="+2"><B>MOZILLA PUBLIC LICENSE</B></FONT><BR> <B>Version 1.0</B> </P> <P><HR WIDTH="20%"><P> <P><B>1. Definitions.</B> <UL> <B>1.1. ``Contributor''</B> means each entity that creates or contributes to the creation of Modifications. <P><B>1.2. ``Contributor Version''</B> means the combination of the Original Code, prior Modifications used by a Contributor, and the Modifications made by that particular Contributor. <P><B>1.3. ``Covered Code''</B> means the Original Code or Modifications or the combination of the Original Code and Modifications, in each case including portions thereof<B>.</B> <P><B>1.4. ``Electronic Distribution Mechanism''</B> means a mechanism generally accepted in the software development community for the electronic transfer of data. <P><B>1.5. ``Executable''</B> means Covered Code in any form other than Source Code. <P><B>1.6. ``Initial Developer''</B> means the individual or entity identified as the Initial Developer in the Source Code notice required by <B>Exhibit A</B>. <P><B>1.7. ``Larger Work''</B> means a work which combines Covered Code or portions thereof with code not governed by the terms of this License. <P><B>1.8. ``License''</B> means this document. <P><B>1.9. ``Modifications''</B> means any addition to or deletion from the substance or structure of either the Original Code or any previous Modifications. When Covered Code is released as a series of files, a Modification is: <UL> <P><B>A.</B> Any addition to or deletion from the contents of a file containing Original Code or previous Modifications. <P><B>B.</B> Any new file that contains any part of the Original Code or previous Modifications. </UL> <P><B>1.10. ``Original Code''</B> means Source Code of computer software code which is described in the Source Code notice required by <B>Exhibit A</B> as Original Code, and which, at the time of its release under this License is not already Covered Code governed by this License. <P><B>1.11. ``Source Code''</B> means the preferred form of the Covered Code for making modifications to it, including all modules it contains, plus any associated interface definition files, scripts used to control compilation and installation of an Executable, or a list of source code differential comparisons against either the Original Code or another well known, available Covered Code of the Contributor's choice. The Source Code can be in a compressed or archival form, provided the appropriate decompression or de-archiving software is widely available for no charge. <P><B>1.12. ``You''</B> means an individual or a legal entity exercising rights under, and complying with all of the terms of, this License or a future version of this License issued under Section 6.1. For legal entities, ``You'' includes any entity which controls, is controlled by, or is under common control with You. For purposes of this definition, ``control'' means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of fifty percent (50%) or more of the outstanding shares or beneficial ownership of such entity. </UL> <B>2. Source Code License.</B> <UL> <B>2.1. The Initial Developer Grant.</B> <BR>The Initial Developer hereby grants You a world-wide, royalty-free, non-exclusive license, subject to third party intellectual property claims: <UL> <P><B>(a)</B> to use, reproduce, modify, display, perform, sublicense and distribute the Original Code (or portions thereof) with or without Modifications, or as part of a Larger Work; and <P><B>(b)</B> under patents now or hereafter owned or controlled by Initial Developer, to make, have made, use and sell (``Utilize'') the Original Code (or portions thereof), but solely to the extent that any such patent is reasonably necessary to enable You to Utilize the Original Code (or portions thereof) and not to any greater extent that may be necessary to Utilize further Modifications or combinations. </UL> <P><B>2.2. Contributor Grant.</B> <BR>Each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license, subject to third party intellectual property claims: <UL> <P><B>(a)</B> to use, reproduce, modify, display, perform, sublicense and distribute the Modifications created by such Contributor (or portions thereof) either on an unmodified basis, with other Modifications, as Covered Code or as part of a Larger Work; and <P><B>(b)</B> under patents now or hereafter owned or controlled by Contributor, to Utilize the Contributor Version (or portions thereof), but solely to the extent that any such patent is reasonably necessary to enable You to Utilize the Contributor Version (or portions thereof), and not to any greater extent that may be necessary to Utilize further Modifications or combinations. </UL> </UL> <B>3. Distribution Obligations.</B> <UL> <B>3.1. Application of License.</B> <BR>The Modifications which You create or to which You contribute are governed by the terms of this License, including without limitation Section <B>2.2</B>. The Source Code version of Covered Code may be distributed only under the terms of this License or a future version of this License released under Section <B>6.1</B>, and You must include a copy of this License with every copy of the Source Code You distribute. You may not offer or impose any terms on any Source Code version that alters or restricts the applicable version of this License or the recipients' rights hereunder. However, You may include an additional document offering the additional rights described in Section <B>3.5</B>. <P><B>3.2. Availability of Source Code.</B> <BR>Any Modification which You create or to which You contribute must be made available in Source Code form under the terms of this License either on the same media as an Executable version or via an accepted Electronic Distribution Mechanism to anyone to whom you made an Executable version available; and if made available via Electronic Distribution Mechanism, must remain available for at least twelve (12) months after the date it initially became available, or at least six (6) months after a subsequent version of that particular Modification has been made available to such recipients. You are responsible for ensuring that the Source Code version remains available even if the Electronic Distribution Mechanism is maintained by a third party. <P><B>3.3. Description of Modifications.</B> <BR>You must cause all Covered Code to which you contribute to contain a file documenting the changes You made to create that Covered Code and the date of any change. You must include a prominent statement that the Modification is derived, directly or indirectly, from Original Code provided by the Initial Developer and including the name of the Initial Developer in (a) the Source Code, and (b) in any notice in an Executable version or related documentation in which You describe the origin or ownership of the Covered Code. <P><B>3.4. Intellectual Property Matters</B> <UL> <P><B>(a) Third Party Claims</B>. <BR>If You have knowledge that a party claims an intellectual property right in particular functionality or code (or its utilization under this License), you must include a text file with the source code distribution titled ``LEGAL'' which describes the claim and the party making the claim in sufficient detail that a recipient will know whom to contact. If you obtain such knowledge after You make Your Modification available as described in Section <B>3.2</B>, You shall promptly modify the LEGAL file in all copies You make available thereafter and shall take other steps (such as notifying appropriate mailing lists or newsgroups) reasonably calculated to inform those who received the Covered Code that new knowledge has been obtained. <P><B>(b) Contributor APIs</B>. <BR>If Your Modification is an application programming interface and You own or control patents which are reasonably necessary to implement that API, you must also include this information in the LEGAL file. </UL> <P><B>3.5. Required Notices.</B> <BR>You must duplicate the notice in <B>Exhibit A</B> in each file of the Source Code, and this License in any documentation for the Source Code, where You describe recipients' rights relating to Covered Code. If You created one or more Modification(s), You may add your name as a Contributor to the notice described in <B>Exhibit A</B>. If it is not possible to put such notice in a particular Source Code file due to its structure, then you must include such notice in a location (such as a relevant directory file) where a user would be likely to look for such a notice. You may choose to offer, and to charge a fee for, warranty, support, indemnity or liability obligations to one or more recipients of Covered Code. However, You may do so only on Your own behalf, and not on behalf of the Initial Developer or any Contributor. You must make it absolutely clear than any such warranty, support, indemnity or liability obligation is offered by You alone, and You hereby agree to indemnify the Initial Developer and every Contributor for any liability incurred by the Initial Developer or such Contributor as a result of warranty, support, indemnity or liability terms You offer. <P><B>3.6. Distribution of Executable Versions.</B> <BR>You may distribute Covered Code in Executable form only if the requirements of Section <B>3.1-3.5</B> have been met for that Covered Code, and if You include a notice stating that the Source Code version of the Covered Code is available under the terms of this License, including a description of how and where You have fulfilled the obligations of Section <B>3.2</B>. The notice must be conspicuously included in any notice in an Executable version, related documentation or collateral in which You describe recipients' rights relating to the Covered Code. You may distribute the Executable version of Covered Code under a license of Your choice, which may contain terms different from this License, provided that You are in compliance with the terms of this License and that the license for the Executable version does not attempt to limit or alter the recipient's rights in the Source Code version from the rights set forth in this License. If You distribute the Executable version under a different license You must make it absolutely clear that any terms which differ from this License are offered by You alone, not by the Initial Developer or any Contributor. You hereby agree to indemnify the Initial Developer and every Contributor for any liability incurred by the Initial Developer or such Contributor as a result of any such terms You offer. <P><B>3.7. Larger Works.</B> <BR>You may create a Larger Work by combining Covered Code with other code not governed by the terms of this License and distribute the Larger Work as a single product. In such a case, You must make sure the requirements of this License are fulfilled for the Covered Code. </UL> <B>4. Inability to Comply Due to Statute or Regulation.</B> <UL> <P>If it is impossible for You to comply with any of the terms of this License with respect to some or all of the Covered Code due to statute or regulation then You must: (a) comply with the terms of this License to the maximum extent possible; and (b) describe the limitations and the code they affect. Such description must be included in the LEGAL file described in Section <B>3.4</B> and must be included with all distributions of the Source Code. Except to the extent prohibited by statute or regulation, such description must be sufficiently detailed for a recipient of ordinary skill to be able to understand it. </UL> <B>5. Application of this License.</B> <UL> This License applies to code to which the Initial Developer has attached the notice in <B>Exhibit A</B>, and to related Covered Code. </UL> <B>6. Versions of the License.</B> <UL> <B>6.1. New Versions</B>. <BR>Netscape Communications Corporation (``Netscape'') may publish revised and/or new versions of the License from time to time. Each version will be given a distinguishing version number. <P><B>6.2. Effect of New Versions</B>. <BR>Once Covered Code has been published under a particular version of the License, You may always continue to use it under the terms of that version. You may also choose to use such Covered Code under the terms of any subsequent version of the License published by Netscape. No one other than Netscape has the right to modify the terms applicable to Covered Code created under this License. <P><B>6.3. Derivative Works</B>. <BR>If you create or use a modified version of this License (which you may only do in order to apply it to code which is not already Covered Code governed by this License), you must (a) rename Your license so that the phrases ``Mozilla'', ``MOZILLAPL'', ``MOZPL'', ``Netscape'', ``NPL'' or any confusingly similar phrase do not appear anywhere in your license and (b) otherwise make it clear that your version of the license contains terms which differ from the Mozilla Public License and Netscape Public License. (Filling in the name of the Initial Developer, Original Code or Contributor in the notice described in <B>Exhibit A</B> shall not of themselves be deemed to be modifications of this License.) </UL> <B>7. DISCLAIMER OF WARRANTY.</B> <UL> COVERED CODE IS PROVIDED UNDER THIS LICENSE ON AN ``AS IS'' BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, WARRANTIES THAT THE COVERED CODE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE OR NON-INFRINGING. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE COVERED CODE IS WITH YOU. SHOULD ANY COVERED CODE PROVE DEFECTIVE IN ANY RESPECT, YOU (NOT THE INITIAL DEVELOPER OR ANY OTHER CONTRIBUTOR) ASSUME THE COST OF ANY NECESSARY SERVICING, REPAIR OR CORRECTION. THIS DISCLAIMER OF WARRANTY CONSTITUTES AN ESSENTIAL PART OF THIS LICENSE. NO USE OF ANY COVERED CODE IS AUTHORIZED HEREUNDER EXCEPT UNDER THIS DISCLAIMER. </UL> <B>8. TERMINATION.</B> <UL> This License and the rights granted hereunder will terminate automatically if You fail to comply with terms herein and fail to cure such breach within 30 days of becoming aware of the breach. All sublicenses to the Covered Code which are properly granted shall survive any termination of this License. Provisions which, by their nature, must remain in effect beyond the termination of this License shall survive. </UL> <B>9. LIMITATION OF LIABILITY.</B> <UL> UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, WHETHER TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE, SHALL THE INITIAL DEVELOPER, ANY OTHER CONTRIBUTOR, OR ANY DISTRIBUTOR OF COVERED CODE, OR ANY SUPPLIER OF ANY OF SUCH PARTIES, BE LIABLE TO YOU OR ANY OTHER PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES, EVEN IF SUCH PARTY SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY RESULTING FROM SUCH PARTY'S NEGLIGENCE TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH LIMITATION. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THAT EXCLUSION AND LIMITATION MAY NOT APPLY TO YOU. </UL> <B>10. U.S. GOVERNMENT END USERS.</B> <UL> The Covered Code is a ``commercial item,'' as that term is defined in 48 C.F.R. 2.101 (Oct. 1995), consisting of ``commercial computer software'' and ``commercial computer software documentation,'' as such terms are used in 48 C.F.R. 12.212 (Sept. 1995). Consistent with 48 C.F.R. 12.212 and 48 C.F.R. 227.7202-1 through 227.7202-4 (June 1995), all U.S. Government End Users acquire Covered Code with only those rights set forth herein. </UL> <B>11. MISCELLANEOUS.</B> <UL> This License represents the complete agreement concerning subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. This License shall be governed by California law provisions (except to the extent applicable law, if any, provides otherwise), excluding its conflict-of-law provisions. With respect to disputes in which at least one party is a citizen of, or an entity chartered or registered to do business in, the United States of America: (a) unless otherwise agreed in writing, all disputes relating to this License (excepting any dispute relating to intellectual property rights) shall be subject to final and binding arbitration, with the losing party paying all costs of arbitration; (b) any arbitration relating to this Agreement shall be held in Santa Clara County, California, under the auspices of JAMS/EndDispute; and (c) any litigation relating to this Agreement shall be subject to the jurisdiction of the Federal Courts of the Northern District of California, with venue lying in Santa Clara County, California, with the losing party responsible for costs, including without limitation, court costs and reasonable attorneys fees and expenses. The application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. Any law or regulation which provides that the language of a contract shall be construed against the drafter shall not apply to this License. </UL> <B>12. RESPONSIBILITY FOR CLAIMS.</B> <UL> Except in cases where another Contributor has failed to comply with Section <B>3.4</B>, You are responsible for damages arising, directly or indirectly, out of Your utilization of rights under this License, based on the number of copies of Covered Code you made available, the revenues you received from utilizing such rights, and other relevant factors. You agree to work with affected parties to distribute responsibility on an equitable basis. </UL> <B>EXHIBIT A.</B> <UL> ``The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ <P>Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. <P>The Original Code is ______________________________________. <P>The Initial Developer of the Original Code is ________________________. Portions created by ______________________ are Copyright (C) ______ _______________________. All Rights Reserved. <P>Contributor(s): ______________________________________.'' </UL> 1.1 apache-1.3/src/lib/expat/Makefile.tmpl Index: Makefile.tmpl =================================================================== # # default definition of these two. dunno how to get it prepended when the # Makefile is built, so we do it manually # CFLAGS=$(OPTIM) $(CFLAGS1) $(EXTRA_CFLAGS) INCLUDES=$(INCLUDES1) $(INCLUDES0) $(EXTRA_INCLUDES) # If you know what your system's byte order is, define BYTE_ORDER: # use -DBYTE_ORDER=12 for little-endian byte order; # use -DBYTE_ORDER=21 for big-endian (network) byte order. #CFLAGS=-O2 OBJS=xmltok.o xmlrole.o xmlparse.o hashtable.o all lib: libexpat.a libexpat.a: $(OBJS) rm -f libexpat.a ar cr libexpat.a $(OBJS) $(RANLIB) libexpat.a clean: rm -f $(OBJS) libexpat.a distclean: clean -rm -f Makefile .SUFFIXES: .o .c.o: $(CC) -c $(INCLUDES) $(CFLAGS) $< 1.1 apache-1.3/src/lib/expat/asciitab.h Index: asciitab.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ /* 0x00 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x04 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x08 */ BT_NONXML, BT_S, BT_LF, BT_NONXML, /* 0x0C */ BT_NONXML, BT_CR, BT_NONXML, BT_NONXML, /* 0x10 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x14 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x18 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x1C */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x20 */ BT_S, BT_EXCL, BT_QUOT, BT_NUM, /* 0x24 */ BT_OTHER, BT_PERCNT, BT_AMP, BT_APOS, /* 0x28 */ BT_LPAR, BT_RPAR, BT_AST, BT_PLUS, /* 0x2C */ BT_COMMA, BT_MINUS, BT_NAME, BT_SOL, /* 0x30 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT, /* 0x34 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT, /* 0x38 */ BT_DIGIT, BT_DIGIT, BT_NMSTRT, BT_SEMI, /* 0x3C */ BT_LT, BT_EQUALS, BT_GT, BT_QUEST, /* 0x40 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX, /* 0x44 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT, /* 0x48 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x4C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x50 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x54 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x58 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_LSQB, /* 0x5C */ BT_OTHER, BT_RSQB, BT_OTHER, BT_NMSTRT, /* 0x60 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX, /* 0x64 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT, /* 0x68 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x6C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x70 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x74 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x78 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER, /* 0x7C */ BT_VERBAR, BT_OTHER, BT_OTHER, BT_OTHER, 1.1 apache-1.3/src/lib/expat/expat.html Index: expat.html =================================================================== <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <HTML> <TITLE>expat</TITLE> <BODY> <H1>expat - XML Parser Toolkit</H1> <H3>Version 1.0.2</H3> <P>Copyright (c) 1998, 1999 James Clark. Expat is subject to the <A HREF="MPL-1_0.html">Mozilla Public License Version 1.0</A>. You may as a special exception <A href="gplelect.html">elect to use the GNU General Public License</A>. Please contact me if you wish to negotiate an alternative license.</P> <P>Expat is an <A HREF="http://www.w3.org/TR/1998/REC-xml-19980210">XML 1.0</A> parser written in C. It aims to be fully conforming. It is currently not a validating XML processor. Expat can be downloaded from <A href = "ftp://ftp.jclark.com/pub/xml/expat.zip" >ftp://ftp.jclark.com/pub/xml/expat.zip</A>. This is a production version.</P> <P>The directory <SAMP>xmltok</SAMP> contains a low-level library for tokenizing XML. The interface is documented in <SAMP>xmltok/xmltok.h</SAMP>.</P> <P>The directory <SAMP>xmlparse</SAMP> contains an XML parser library which is built on top of the <SAMP>xmltok</SAMP> library. The interface is documented in <SAMP>xmlparse/xmlparse.h</SAMP>. The directory <SAMP>sample</SAMP> contains a simple example program using this interface; <SAMP>sample/build.bat</SAMP> is a batch file to build the example using Visual C++.</P> <P>The directory <SAMP>xmlwf</SAMP> contains the <SAMP>xmlwf</SAMP> application, which uses the <SAMP>xmlparse</SAMP> library. The arguments to <SAMP>xmlwf</SAMP> are one or more files which are each to be checked for well-formedness. An option <SAMP>-d <VAR>dir</VAR></SAMP> can be specified; for each well-formed input file the corresponding <A href="http://www.jclark.com/xml/canonxml.html">canonical XML</A> will be written to <SAMP>dir/<VAR>f</VAR></SAMP>, where <SAMP><VAR>f</VAR></SAMP> is the filename (without any path) of the input file. A <CODE>-x</CODE> option will cause references to external general entities to be processed.</P> <P>The <SAMP>bin</SAMP> directory contains Win32 executables. The <SAMP>lib</SAMP> directory contains Win32 import libraries.</P> <P></P> <ADDRESS> <A HREF="mailto:[EMAIL PROTECTED]">James Clark</A> </ADDRESS> </BODY> </HTML> 1.1 apache-1.3/src/lib/expat/gplelect.html Index: gplelect.html =================================================================== <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <HTML> <TITLE>Using the GPL for expat</TITLE> <BODY> <H1>Using the GPL for expat</H1> <P>Instead of using the Mozilla Public License, you may instead elect to use the <A href="http://www.gnu.org/copyleft/gpl.html">GNU General Public License</A> (GPL) for a particular copy of expat. The election to use the GPL is irrevocable for that particular copy. If you do elect to use the GPL, then you must replace the copyright notice on each file with the following:</P> <PRE> expat XML parser Copyright (C) 1998 James Clark This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. </PRE> <P>and you must change readme.html to say that an election has been made to use this copy under the GPL. If you redistribute expat, you should also include a copy of the GPL.</P> <P>Note that I will probably not accept patches under terms that allow me to distribute them only under the GPL.</P> <P></P> <ADDRESS> <A HREF="mailto:[EMAIL PROTECTED]">James Clark</A> </ADDRESS> </BODY> </HTML> 1.1 apache-1.3/src/lib/expat/hashtable.c Index: hashtable.c =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in csompliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #include <stdlib.h> #include <string.h> #include "hashtable.h" #ifdef XML_UNICODE #define keycmp wcscmp #else #define keycmp strcmp #endif #define INIT_SIZE 64 static unsigned long hash(KEY s) { unsigned long h = 0; while (*s) h = (h << 5) + h + (unsigned char)*s++; return h; } NAMED *lookup(HASH_TABLE *table, KEY name, size_t createSize) { size_t i; if (table->size == 0) { if (!createSize) return 0; table->v = calloc(INIT_SIZE, sizeof(NAMED *)); if (!table->v) return 0; table->size = INIT_SIZE; table->usedLim = INIT_SIZE / 2; i = hash(name) & (table->size - 1); } else { unsigned long h = hash(name); for (i = h & (table->size - 1); table->v[i]; i == 0 ? i = table->size - 1 : --i) { if (keycmp(name, table->v[i]->name) == 0) return table->v[i]; } if (!createSize) return 0; if (table->used == table->usedLim) { /* check for overflow */ size_t newSize = table->size * 2; NAMED **newV = calloc(newSize, sizeof(NAMED *)); if (!newV) return 0; for (i = 0; i < table->size; i++) if (table->v[i]) { size_t j; for (j = hash(table->v[i]->name) & (newSize - 1); newV[j]; j == 0 ? j = newSize - 1 : --j) ; newV[j] = table->v[i]; } free(table->v); table->v = newV; table->size = newSize; table->usedLim = newSize/2; for (i = h & (table->size - 1); table->v[i]; i == 0 ? i = table->size - 1 : --i) ; } } table->v[i] = calloc(1, createSize); if (!table->v[i]) return 0; table->v[i]->name = name; (table->used)++; return table->v[i]; } void hashTableDestroy(HASH_TABLE *table) { size_t i; for (i = 0; i < table->size; i++) { NAMED *p = table->v[i]; if (p) free(p); } free(table->v); } void hashTableInit(HASH_TABLE *p) { p->size = 0; p->usedLim = 0; p->used = 0; p->v = 0; } void hashTableIterInit(HASH_TABLE_ITER *iter, const HASH_TABLE *table) { iter->p = table->v; iter->end = iter->p + table->size; } NAMED *hashTableIterNext(HASH_TABLE_ITER *iter) { while (iter->p != iter->end) { NAMED *tem = *(iter->p)++; if (tem) return tem; } return 0; } 1.1 apache-1.3/src/lib/expat/hashtable.h Index: hashtable.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #include <stddef.h> #ifdef XML_UNICODE typedef const wchar_t *KEY; #else typedef const char *KEY; #endif typedef struct { KEY name; } NAMED; typedef struct { NAMED **v; size_t size; size_t used; size_t usedLim; } HASH_TABLE; NAMED *lookup(HASH_TABLE *table, KEY name, size_t createSize); void hashTableInit(HASH_TABLE *); void hashTableDestroy(HASH_TABLE *); typedef struct { NAMED **p; NAMED **end; } HASH_TABLE_ITER; void hashTableIterInit(HASH_TABLE_ITER *, const HASH_TABLE *); NAMED *hashTableIterNext(HASH_TABLE_ITER *); 1.1 apache-1.3/src/lib/expat/iasciitab.h Index: iasciitab.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ /* Like asciitab.h, except that 0xD has code BT_S rather than BT_CR */ /* 0x00 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x04 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x08 */ BT_NONXML, BT_S, BT_LF, BT_NONXML, /* 0x0C */ BT_NONXML, BT_S, BT_NONXML, BT_NONXML, /* 0x10 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x14 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x18 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x1C */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0x20 */ BT_S, BT_EXCL, BT_QUOT, BT_NUM, /* 0x24 */ BT_OTHER, BT_PERCNT, BT_AMP, BT_APOS, /* 0x28 */ BT_LPAR, BT_RPAR, BT_AST, BT_PLUS, /* 0x2C */ BT_COMMA, BT_MINUS, BT_NAME, BT_SOL, /* 0x30 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT, /* 0x34 */ BT_DIGIT, BT_DIGIT, BT_DIGIT, BT_DIGIT, /* 0x38 */ BT_DIGIT, BT_DIGIT, BT_NMSTRT, BT_SEMI, /* 0x3C */ BT_LT, BT_EQUALS, BT_GT, BT_QUEST, /* 0x40 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX, /* 0x44 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT, /* 0x48 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x4C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x50 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x54 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x58 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_LSQB, /* 0x5C */ BT_OTHER, BT_RSQB, BT_OTHER, BT_NMSTRT, /* 0x60 */ BT_OTHER, BT_HEX, BT_HEX, BT_HEX, /* 0x64 */ BT_HEX, BT_HEX, BT_HEX, BT_NMSTRT, /* 0x68 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x6C */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x70 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x74 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0x78 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER, /* 0x7C */ BT_VERBAR, BT_OTHER, BT_OTHER, BT_OTHER, 1.1 apache-1.3/src/lib/expat/latin1tab.h Index: latin1tab.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ /* 0x80 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x84 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x88 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x8C */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x90 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x94 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x98 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0x9C */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xA0 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xA4 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xA8 */ BT_OTHER, BT_OTHER, BT_NMSTRT, BT_OTHER, /* 0xAC */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xB0 */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xB4 */ BT_OTHER, BT_NMSTRT, BT_OTHER, BT_NAME, /* 0xB8 */ BT_OTHER, BT_OTHER, BT_NMSTRT, BT_OTHER, /* 0xBC */ BT_OTHER, BT_OTHER, BT_OTHER, BT_OTHER, /* 0xC0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xC4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xC8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xCC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xD0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xD4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER, /* 0xD8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xDC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xE0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xE4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xE8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xEC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xF0 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xF4 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_OTHER, /* 0xF8 */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, /* 0xFC */ BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, BT_NMSTRT, 1.1 apache-1.3/src/lib/expat/nametab.h Index: nametab.h =================================================================== static const unsigned namingBitmap[] = { 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000000, 0x04000000, 0x87FFFFFE, 0x07FFFFFE, 0x00000000, 0x00000000, 0xFF7FFFFF, 0xFF7FFFFF, 0xFFFFFFFF, 0x7FF3FFFF, 0xFFFFFDFE, 0x7FFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFE00F, 0xFC31FFFF, 0x00FFFFFF, 0x00000000, 0xFFFF0000, 0xFFFFFFFF, 0xFFFFFFFF, 0xF80001FF, 0x00000003, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFD740, 0xFFFFFFFB, 0x547F7FFF, 0x000FFFFD, 0xFFFFDFFE, 0xFFFFFFFF, 0xDFFEFFFF, 0xFFFFFFFF, 0xFFFF0003, 0xFFFFFFFF, 0xFFFF199F, 0x033FCFFF, 0x00000000, 0xFFFE0000, 0x027FFFFF, 0xFFFFFFFE, 0x0000007F, 0x00000000, 0xFFFF0000, 0x000707FF, 0x00000000, 0x07FFFFFE, 0x000007FE, 0xFFFE0000, 0xFFFFFFFF, 0x7CFFFFFF, 0x002F7FFF, 0x00000060, 0xFFFFFFE0, 0x23FFFFFF, 0xFF000000, 0x00000003, 0xFFF99FE0, 0x03C5FDFF, 0xB0000000, 0x00030003, 0xFFF987E0, 0x036DFDFF, 0x5E000000, 0x001C0000, 0xFFFBAFE0, 0x23EDFDFF, 0x00000000, 0x00000001, 0xFFF99FE0, 0x23CDFDFF, 0xB0000000, 0x00000003, 0xD63DC7E0, 0x03BFC718, 0x00000000, 0x00000000, 0xFFFDDFE0, 0x03EFFDFF, 0x00000000, 0x00000003, 0xFFFDDFE0, 0x03EFFDFF, 0x40000000, 0x00000003, 0xFFFDDFE0, 0x03FFFDFF, 0x00000000, 0x00000003, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFE, 0x000D7FFF, 0x0000003F, 0x00000000, 0xFEF02596, 0x200D6CAE, 0x0000001F, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFEFF, 0x000003FF, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFF003F, 0x007FFFFF, 0x0007DAED, 0x50000000, 0x82315001, 0x002C62AB, 0x40000000, 0xF580C900, 0x00000007, 0x02010800, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0FFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x03FFFFFF, 0x3F3FFFFF, 0xFFFFFFFF, 0xAAFF3F3F, 0x3FFFFFFF, 0xFFFFFFFF, 0x5FDFFFFF, 0x0FCF1FDC, 0x1FDC1FFF, 0x00000000, 0x00004C40, 0x00000000, 0x00000000, 0x00000007, 0x00000000, 0x00000000, 0x00000000, 0x00000080, 0x000003FE, 0xFFFFFFFE, 0xFFFFFFFF, 0x001FFFFF, 0xFFFFFFFE, 0xFFFFFFFF, 0x07FFFFFF, 0xFFFFFFE0, 0x00001FFF, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000003F, 0x00000000, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F, 0x00000000, 0x00000000, 0x00000000, 0x07FF6000, 0x87FFFFFE, 0x07FFFFFE, 0x00000000, 0x00800000, 0xFF7FFFFF, 0xFF7FFFFF, 0x00FFFFFF, 0x00000000, 0xFFFF0000, 0xFFFFFFFF, 0xFFFFFFFF, 0xF80001FF, 0x00030003, 0x00000000, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000003F, 0x00000003, 0xFFFFD7C0, 0xFFFFFFFB, 0x547F7FFF, 0x000FFFFD, 0xFFFFDFFE, 0xFFFFFFFF, 0xDFFEFFFF, 0xFFFFFFFF, 0xFFFF007B, 0xFFFFFFFF, 0xFFFF199F, 0x033FCFFF, 0x00000000, 0xFFFE0000, 0x027FFFFF, 0xFFFFFFFE, 0xFFFE007F, 0xBBFFFFFB, 0xFFFF0016, 0x000707FF, 0x00000000, 0x07FFFFFE, 0x0007FFFF, 0xFFFF03FF, 0xFFFFFFFF, 0x7CFFFFFF, 0xFFEF7FFF, 0x03FF3DFF, 0xFFFFFFEE, 0xF3FFFFFF, 0xFF1E3FFF, 0x0000FFCF, 0xFFF99FEE, 0xD3C5FDFF, 0xB080399F, 0x0003FFCF, 0xFFF987E4, 0xD36DFDFF, 0x5E003987, 0x001FFFC0, 0xFFFBAFEE, 0xF3EDFDFF, 0x00003BBF, 0x0000FFC1, 0xFFF99FEE, 0xF3CDFDFF, 0xB0C0398F, 0x0000FFC3, 0xD63DC7EC, 0xC3BFC718, 0x00803DC7, 0x0000FF80, 0xFFFDDFEE, 0xC3EFFDFF, 0x00603DDF, 0x0000FFC3, 0xFFFDDFEC, 0xC3EFFDFF, 0x40603DDF, 0x0000FFC3, 0xFFFDDFEC, 0xC3FFFDFF, 0x00803DCF, 0x0000FFC3, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0xFFFFFFFE, 0x07FF7FFF, 0x03FF7FFF, 0x00000000, 0xFEF02596, 0x3BFF6CAE, 0x03FF3F5F, 0x00000000, 0x03000000, 0xC2A003FF, 0xFFFFFEFF, 0xFFFE03FF, 0xFEBF0FDF, 0x02FE3FFF, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x1FFF0000, 0x00000002, 0x000000A0, 0x003EFFFE, 0xFFFFFFFE, 0xFFFFFFFF, 0x661FFFFF, 0xFFFFFFFE, 0xFFFFFFFF, 0x77FFFFFF, }; static const unsigned char nmstrtPages[] = { 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x00, 0x00, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0x10, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x13, 0x00, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x15, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x17, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x18, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, }; static const unsigned char namePages[] = { 0x19, 0x03, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x00, 0x00, 0x1F, 0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x10, 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x12, 0x13, 0x26, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x27, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x17, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x18, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, }; 1.1 apache-1.3/src/lib/expat/utf8tab.h Index: utf8tab.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ /* 0x80 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x84 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x88 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x8C */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x90 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x94 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x98 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0x9C */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xA0 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xA4 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xA8 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xAC */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xB0 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xB4 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xB8 */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xBC */ BT_TRAIL, BT_TRAIL, BT_TRAIL, BT_TRAIL, /* 0xC0 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xC4 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xC8 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xCC */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xD0 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xD4 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xD8 */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xDC */ BT_LEAD2, BT_LEAD2, BT_LEAD2, BT_LEAD2, /* 0xE0 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3, /* 0xE4 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3, /* 0xE8 */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3, /* 0xEC */ BT_LEAD3, BT_LEAD3, BT_LEAD3, BT_LEAD3, /* 0xF0 */ BT_LEAD4, BT_LEAD4, BT_LEAD4, BT_LEAD4, /* 0xF4 */ BT_LEAD4, BT_NONXML, BT_NONXML, BT_NONXML, /* 0xF8 */ BT_NONXML, BT_NONXML, BT_NONXML, BT_NONXML, /* 0xFC */ BT_NONXML, BT_NONXML, BT_MALFORM, BT_MALFORM, 1.1 apache-1.3/src/lib/expat/xmlparse.c Index: xmlparse.c =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #include <stdlib.h> #include <string.h> #include <stddef.h> #ifdef XML_UNICODE #define XML_ENCODE_MAX XML_UTF16_ENCODE_MAX #define XmlConvert XmlUtf16Convert #define XmlGetInternalEncoding XmlGetUtf16InternalEncoding #define XmlEncode XmlUtf16Encode #define MUST_CONVERT(enc, s) (!(enc)->isUtf16 || (((unsigned long)s) & 1)) typedef unsigned short ICHAR; #else #define XML_ENCODE_MAX XML_UTF8_ENCODE_MAX #define XmlConvert XmlUtf8Convert #define XmlGetInternalEncoding XmlGetUtf8InternalEncoding #define XmlEncode XmlUtf8Encode #define MUST_CONVERT(enc, s) (!(enc)->isUtf8) typedef char ICHAR; #endif #ifdef XML_UNICODE_WCHAR_T #define XML_T(x) L ## x #else #define XML_T(x) x #endif /* Round up n to be a multiple of sz, where sz is a power of 2. */ #define ROUND_UP(n, sz) (((n) + ((sz) - 1)) & ~((sz) - 1)) #include "xmlparse.h" #include "xmltok.h" #include "xmlrole.h" #include "hashtable.h" #define INIT_TAG_BUF_SIZE 32 /* must be a multiple of sizeof(XML_Char) */ #define INIT_DATA_BUF_SIZE 1024 #define INIT_ATTS_SIZE 16 #define INIT_BLOCK_SIZE 1024 #define INIT_BUFFER_SIZE 1024 typedef struct tag { struct tag *parent; const char *rawName; int rawNameLength; const XML_Char *name; char *buf; char *bufEnd; } TAG; typedef struct { const XML_Char *name; const XML_Char *textPtr; int textLen; const XML_Char *systemId; const XML_Char *base; const XML_Char *publicId; const XML_Char *notation; char open; } ENTITY; typedef struct block { struct block *next; int size; XML_Char s[1]; } BLOCK; typedef struct { BLOCK *blocks; BLOCK *freeBlocks; const XML_Char *end; XML_Char *ptr; XML_Char *start; } STRING_POOL; /* The XML_Char before the name is used to determine whether an attribute has been specified. */ typedef struct { XML_Char *name; char maybeTokenized; } ATTRIBUTE_ID; typedef struct { const ATTRIBUTE_ID *id; char isCdata; const XML_Char *value; } DEFAULT_ATTRIBUTE; typedef struct { const XML_Char *name; int nDefaultAtts; int allocDefaultAtts; DEFAULT_ATTRIBUTE *defaultAtts; } ELEMENT_TYPE; typedef struct { HASH_TABLE generalEntities; HASH_TABLE elementTypes; HASH_TABLE attributeIds; STRING_POOL pool; int complete; int standalone; const XML_Char *base; } DTD; typedef enum XML_Error Processor(XML_Parser parser, const char *start, const char *end, const char **endPtr); static Processor prologProcessor; static Processor prologInitProcessor; static Processor contentProcessor; static Processor cdataSectionProcessor; static Processor epilogProcessor; static Processor errorProcessor; static Processor externalEntityInitProcessor; static Processor externalEntityInitProcessor2; static Processor externalEntityInitProcessor3; static Processor externalEntityContentProcessor; static enum XML_Error handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName); static enum XML_Error processXmlDecl(XML_Parser parser, int isGeneralTextEntity, const char *, const char *); static enum XML_Error initializeEncoding(XML_Parser parser); static enum XML_Error doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc, const char *start, const char *end, const char **endPtr); static enum XML_Error doCdataSection(XML_Parser parser, const ENCODING *, const char **startPtr, const char *end, const char **nextPtr); static enum XML_Error storeAtts(XML_Parser parser, const ENCODING *, const XML_Char *tagName, const char *s); static int defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *, int isCdata, const XML_Char *dfltValue); static enum XML_Error storeAttributeValue(XML_Parser parser, const ENCODING *, int isCdata, const char *, const char *, STRING_POOL *); static enum XML_Error appendAttributeValue(XML_Parser parser, const ENCODING *, int isCdata, const char *, const char *, STRING_POOL *); static ATTRIBUTE_ID * getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static enum XML_Error storeEntityValue(XML_Parser parser, const char *start, const char *end); static int reportProcessingInstruction(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static void reportDefault(XML_Parser parser, const ENCODING *enc, const char *start, const char *end); static const XML_Char *getOpenEntityNames(XML_Parser parser); static int setOpenEntityNames(XML_Parser parser, const XML_Char *openEntityNames); static void normalizePublicId(XML_Char *s); static int dtdInit(DTD *); static void dtdDestroy(DTD *); static int dtdCopy(DTD *newDtd, const DTD *oldDtd); static void poolInit(STRING_POOL *); static void poolClear(STRING_POOL *); static void poolDestroy(STRING_POOL *); static XML_Char *poolAppend(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end); static XML_Char *poolStoreString(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end); static int poolGrow(STRING_POOL *pool); static const XML_Char *poolCopyString(STRING_POOL *pool, const XML_Char *s); static const XML_Char *poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n); #define poolStart(pool) ((pool)->start) #define poolEnd(pool) ((pool)->ptr) #define poolLength(pool) ((pool)->ptr - (pool)->start) #define poolChop(pool) ((void)--(pool->ptr)) #define poolLastChar(pool) (((pool)->ptr)[-1]) #define poolDiscard(pool) ((pool)->ptr = (pool)->start) #define poolFinish(pool) ((pool)->start = (pool)->ptr) #define poolAppendChar(pool, c) \ (((pool)->ptr == (pool)->end && !poolGrow(pool)) \ ? 0 \ : ((*((pool)->ptr)++ = c), 1)) typedef struct { /* The first member must be userData so that the XML_GetUserData macro works. */ void *userData; void *handlerArg; char *buffer; /* first character to be parsed */ const char *bufferPtr; /* past last character to be parsed */ char *bufferEnd; /* allocated end of buffer */ const char *bufferLim; long parseEndByteIndex; const char *parseEndPtr; XML_Char *dataBuf; XML_Char *dataBufEnd; XML_StartElementHandler startElementHandler; XML_EndElementHandler endElementHandler; XML_CharacterDataHandler characterDataHandler; XML_ProcessingInstructionHandler processingInstructionHandler; XML_DefaultHandler defaultHandler; XML_UnparsedEntityDeclHandler unparsedEntityDeclHandler; XML_NotationDeclHandler notationDeclHandler; XML_ExternalEntityRefHandler externalEntityRefHandler; XML_UnknownEncodingHandler unknownEncodingHandler; const ENCODING *encoding; INIT_ENCODING initEncoding; const XML_Char *protocolEncodingName; void *unknownEncodingMem; void *unknownEncodingData; void *unknownEncodingHandlerData; void (*unknownEncodingRelease)(void *); PROLOG_STATE prologState; Processor *processor; enum XML_Error errorCode; const char *eventPtr; const char *eventEndPtr; const char *positionPtr; int tagLevel; ENTITY *declEntity; const XML_Char *declNotationName; const XML_Char *declNotationPublicId; ELEMENT_TYPE *declElementType; ATTRIBUTE_ID *declAttributeId; char declAttributeIsCdata; DTD dtd; TAG *tagStack; TAG *freeTagList; int attsSize; ATTRIBUTE *atts; POSITION position; STRING_POOL tempPool; STRING_POOL temp2Pool; char *groupConnector; unsigned groupSize; int hadExternalDoctype; } Parser; #define userData (((Parser *)parser)->userData) #define handlerArg (((Parser *)parser)->handlerArg) #define startElementHandler (((Parser *)parser)->startElementHandler) #define endElementHandler (((Parser *)parser)->endElementHandler) #define characterDataHandler (((Parser *)parser)->characterDataHandler) #define processingInstructionHandler (((Parser *)parser)->processingInstructionHandler) #define defaultHandler (((Parser *)parser)->defaultHandler) #define unparsedEntityDeclHandler (((Parser *)parser)->unparsedEntityDeclHandler) #define notationDeclHandler (((Parser *)parser)->notationDeclHandler) #define externalEntityRefHandler (((Parser *)parser)->externalEntityRefHandler) #define unknownEncodingHandler (((Parser *)parser)->unknownEncodingHandler) #define encoding (((Parser *)parser)->encoding) #define initEncoding (((Parser *)parser)->initEncoding) #define unknownEncodingMem (((Parser *)parser)->unknownEncodingMem) #define unknownEncodingData (((Parser *)parser)->unknownEncodingData) #define unknownEncodingHandlerData \ (((Parser *)parser)->unknownEncodingHandlerData) #define unknownEncodingRelease (((Parser *)parser)->unknownEncodingRelease) #define protocolEncodingName (((Parser *)parser)->protocolEncodingName) #define prologState (((Parser *)parser)->prologState) #define processor (((Parser *)parser)->processor) #define errorCode (((Parser *)parser)->errorCode) #define eventPtr (((Parser *)parser)->eventPtr) #define eventEndPtr (((Parser *)parser)->eventEndPtr) #define positionPtr (((Parser *)parser)->positionPtr) #define position (((Parser *)parser)->position) #define tagLevel (((Parser *)parser)->tagLevel) #define buffer (((Parser *)parser)->buffer) #define bufferPtr (((Parser *)parser)->bufferPtr) #define bufferEnd (((Parser *)parser)->bufferEnd) #define parseEndByteIndex (((Parser *)parser)->parseEndByteIndex) #define parseEndPtr (((Parser *)parser)->parseEndPtr) #define bufferLim (((Parser *)parser)->bufferLim) #define dataBuf (((Parser *)parser)->dataBuf) #define dataBufEnd (((Parser *)parser)->dataBufEnd) #define dtd (((Parser *)parser)->dtd) #define declEntity (((Parser *)parser)->declEntity) #define declNotationName (((Parser *)parser)->declNotationName) #define declNotationPublicId (((Parser *)parser)->declNotationPublicId) #define declElementType (((Parser *)parser)->declElementType) #define declAttributeId (((Parser *)parser)->declAttributeId) #define declAttributeIsCdata (((Parser *)parser)->declAttributeIsCdata) #define freeTagList (((Parser *)parser)->freeTagList) #define tagStack (((Parser *)parser)->tagStack) #define atts (((Parser *)parser)->atts) #define attsSize (((Parser *)parser)->attsSize) #define tempPool (((Parser *)parser)->tempPool) #define temp2Pool (((Parser *)parser)->temp2Pool) #define groupConnector (((Parser *)parser)->groupConnector) #define groupSize (((Parser *)parser)->groupSize) #define hadExternalDoctype (((Parser *)parser)->hadExternalDoctype) XML_Parser XML_ParserCreate(const XML_Char *encodingName) { XML_Parser parser = malloc(sizeof(Parser)); if (!parser) return parser; processor = prologInitProcessor; XmlPrologStateInit(&prologState); userData = 0; handlerArg = 0; startElementHandler = 0; endElementHandler = 0; characterDataHandler = 0; processingInstructionHandler = 0; defaultHandler = 0; unparsedEntityDeclHandler = 0; notationDeclHandler = 0; externalEntityRefHandler = 0; unknownEncodingHandler = 0; buffer = 0; bufferPtr = 0; bufferEnd = 0; parseEndByteIndex = 0; parseEndPtr = 0; bufferLim = 0; declElementType = 0; declAttributeId = 0; declEntity = 0; declNotationName = 0; declNotationPublicId = 0; memset(&position, 0, sizeof(POSITION)); errorCode = XML_ERROR_NONE; eventPtr = 0; eventEndPtr = 0; positionPtr = 0; tagLevel = 0; tagStack = 0; freeTagList = 0; attsSize = INIT_ATTS_SIZE; atts = malloc(attsSize * sizeof(ATTRIBUTE)); dataBuf = malloc(INIT_DATA_BUF_SIZE * sizeof(XML_Char)); groupSize = 0; groupConnector = 0; hadExternalDoctype = 0; unknownEncodingMem = 0; unknownEncodingRelease = 0; unknownEncodingData = 0; unknownEncodingHandlerData = 0; poolInit(&tempPool); poolInit(&temp2Pool); protocolEncodingName = encodingName ? poolCopyString(&tempPool, encodingName) : 0; if (!dtdInit(&dtd) || !atts || !dataBuf || (encodingName && !protocolEncodingName)) { XML_ParserFree(parser); return 0; } dataBufEnd = dataBuf + INIT_DATA_BUF_SIZE; XmlInitEncoding(&initEncoding, &encoding, 0); return parser; } XML_Parser XML_ExternalEntityParserCreate(XML_Parser oldParser, const XML_Char *openEntityNames, const XML_Char *encodingName) { XML_Parser parser = oldParser; DTD *oldDtd = &dtd; XML_StartElementHandler oldStartElementHandler = startElementHandler; XML_EndElementHandler oldEndElementHandler = endElementHandler; XML_CharacterDataHandler oldCharacterDataHandler = characterDataHandler; XML_ProcessingInstructionHandler oldProcessingInstructionHandler = processingInstructionHandler; XML_DefaultHandler oldDefaultHandler = defaultHandler; XML_ExternalEntityRefHandler oldExternalEntityRefHandler = externalEntityRefHandler; XML_UnknownEncodingHandler oldUnknownEncodingHandler = unknownEncodingHandler; void *oldUserData = userData; void *oldHandlerArg = handlerArg; parser = XML_ParserCreate(encodingName); if (!parser) return 0; startElementHandler = oldStartElementHandler; endElementHandler = oldEndElementHandler; characterDataHandler = oldCharacterDataHandler; processingInstructionHandler = oldProcessingInstructionHandler; defaultHandler = oldDefaultHandler; externalEntityRefHandler = oldExternalEntityRefHandler; unknownEncodingHandler = oldUnknownEncodingHandler; userData = oldUserData; if (oldUserData == oldHandlerArg) handlerArg = userData; else handlerArg = parser; if (!dtdCopy(&dtd, oldDtd) || !setOpenEntityNames(parser, openEntityNames)) { XML_ParserFree(parser); return 0; } processor = externalEntityInitProcessor; return parser; } void XML_ParserFree(XML_Parser parser) { for (;;) { TAG *p; if (tagStack == 0) { if (freeTagList == 0) break; tagStack = freeTagList; freeTagList = 0; } p = tagStack; tagStack = tagStack->parent; free(p->buf); free(p); } poolDestroy(&tempPool); poolDestroy(&temp2Pool); dtdDestroy(&dtd); free((void *)atts); free(groupConnector); free(buffer); free(dataBuf); free(unknownEncodingMem); if (unknownEncodingRelease) unknownEncodingRelease(unknownEncodingData); free(parser); } void XML_UseParserAsHandlerArg(XML_Parser parser) { handlerArg = parser; } void XML_SetUserData(XML_Parser parser, void *p) { if (handlerArg == userData) handlerArg = userData = p; else userData = p; } int XML_SetBase(XML_Parser parser, const XML_Char *p) { if (p) { p = poolCopyString(&dtd.pool, p); if (!p) return 0; dtd.base = p; } else dtd.base = 0; return 1; } const XML_Char *XML_GetBase(XML_Parser parser) { return dtd.base; } void XML_SetElementHandler(XML_Parser parser, XML_StartElementHandler start, XML_EndElementHandler end) { startElementHandler = start; endElementHandler = end; } void XML_SetCharacterDataHandler(XML_Parser parser, XML_CharacterDataHandler handler) { characterDataHandler = handler; } void XML_SetProcessingInstructionHandler(XML_Parser parser, XML_ProcessingInstructionHandler handler) { processingInstructionHandler = handler; } void XML_SetDefaultHandler(XML_Parser parser, XML_DefaultHandler handler) { defaultHandler = handler; } void XML_SetUnparsedEntityDeclHandler(XML_Parser parser, XML_UnparsedEntityDeclHandler handler) { unparsedEntityDeclHandler = handler; } void XML_SetNotationDeclHandler(XML_Parser parser, XML_NotationDeclHandler handler) { notationDeclHandler = handler; } void XML_SetExternalEntityRefHandler(XML_Parser parser, XML_ExternalEntityRefHandler handler) { externalEntityRefHandler = handler; } void XML_SetUnknownEncodingHandler(XML_Parser parser, XML_UnknownEncodingHandler handler, void *data) { unknownEncodingHandler = handler; unknownEncodingHandlerData = data; } int XML_Parse(XML_Parser parser, const char *s, int len, int isFinal) { if (len == 0) { if (!isFinal) return 1; errorCode = processor(parser, bufferPtr, parseEndPtr = bufferEnd, 0); if (errorCode == XML_ERROR_NONE) return 1; eventEndPtr = eventPtr; return 0; } else if (bufferPtr == bufferEnd) { const char *end; int nLeftOver; parseEndByteIndex += len; positionPtr = s; if (isFinal) { errorCode = processor(parser, s, parseEndPtr = s + len, 0); if (errorCode == XML_ERROR_NONE) return 1; eventEndPtr = eventPtr; return 0; } errorCode = processor(parser, s, parseEndPtr = s + len, &end); if (errorCode != XML_ERROR_NONE) { eventEndPtr = eventPtr; return 0; } XmlUpdatePosition(encoding, positionPtr, end, &position); nLeftOver = s + len - end; if (nLeftOver) { if (buffer == 0 || nLeftOver > bufferLim - buffer) { /* FIXME avoid integer overflow */ buffer = buffer == 0 ? malloc(len * 2) : realloc(buffer, len * 2); if (!buffer) { errorCode = XML_ERROR_NO_MEMORY; eventPtr = eventEndPtr = 0; return 0; } bufferLim = buffer + len * 2; } memcpy(buffer, end, nLeftOver); bufferPtr = buffer; bufferEnd = buffer + nLeftOver; } return 1; } else { memcpy(XML_GetBuffer(parser, len), s, len); return XML_ParseBuffer(parser, len, isFinal); } } int XML_ParseBuffer(XML_Parser parser, int len, int isFinal) { const char *start = bufferPtr; positionPtr = start; bufferEnd += len; parseEndByteIndex += len; errorCode = processor(parser, start, parseEndPtr = bufferEnd, isFinal ? (const char **)0 : &bufferPtr); if (errorCode == XML_ERROR_NONE) { if (!isFinal) XmlUpdatePosition(encoding, positionPtr, bufferPtr, &position); return 1; } else { eventEndPtr = eventPtr; return 0; } } void *XML_GetBuffer(XML_Parser parser, int len) { if (len > bufferLim - bufferEnd) { /* FIXME avoid integer overflow */ int neededSize = len + (bufferEnd - bufferPtr); if (neededSize <= bufferLim - buffer) { memmove(buffer, bufferPtr, bufferEnd - bufferPtr); bufferEnd = buffer + (bufferEnd - bufferPtr); bufferPtr = buffer; } else { char *newBuf; int bufferSize = bufferLim - bufferPtr; if (bufferSize == 0) bufferSize = INIT_BUFFER_SIZE; do { bufferSize *= 2; } while (bufferSize < neededSize); newBuf = malloc(bufferSize); if (newBuf == 0) { errorCode = XML_ERROR_NO_MEMORY; return 0; } bufferLim = newBuf + bufferSize; if (bufferPtr) { memcpy(newBuf, bufferPtr, bufferEnd - bufferPtr); free(buffer); } bufferEnd = newBuf + (bufferEnd - bufferPtr); bufferPtr = buffer = newBuf; } } return bufferEnd; } enum XML_Error XML_GetErrorCode(XML_Parser parser) { return errorCode; } long XML_GetCurrentByteIndex(XML_Parser parser) { if (eventPtr) return parseEndByteIndex - (parseEndPtr - eventPtr); return -1; } int XML_GetCurrentLineNumber(XML_Parser parser) { if (eventPtr) { XmlUpdatePosition(encoding, positionPtr, eventPtr, &position); positionPtr = eventPtr; } return position.lineNumber + 1; } int XML_GetCurrentColumnNumber(XML_Parser parser) { if (eventPtr) { XmlUpdatePosition(encoding, positionPtr, eventPtr, &position); positionPtr = eventPtr; } return position.columnNumber; } void XML_DefaultCurrent(XML_Parser parser) { if (defaultHandler) reportDefault(parser, encoding, eventPtr, eventEndPtr); } const XML_LChar *XML_ErrorString(int code) { static const XML_LChar *message[] = { 0, XML_T("out of memory"), XML_T("syntax error"), XML_T("no element found"), XML_T("not well-formed"), XML_T("unclosed token"), XML_T("unclosed token"), XML_T("mismatched tag"), XML_T("duplicate attribute"), XML_T("junk after document element"), XML_T("illegal parameter entity reference"), XML_T("undefined entity"), XML_T("recursive entity reference"), XML_T("asynchronous entity"), XML_T("reference to invalid character number"), XML_T("reference to binary entity"), XML_T("reference to external entity in attribute"), XML_T("xml processing instruction not at start of external entity"), XML_T("unknown encoding"), XML_T("encoding specified in XML declaration is incorrect"), XML_T("unclosed CDATA section"), XML_T("error in processing external entity reference") }; if (code > 0 && code < sizeof(message)/sizeof(message[0])) return message[code]; return 0; } static enum XML_Error contentProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { return doContent(parser, 0, encoding, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { enum XML_Error result = initializeEncoding(parser); if (result != XML_ERROR_NONE) return result; processor = externalEntityInitProcessor2; return externalEntityInitProcessor2(parser, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor2(XML_Parser parser, const char *start, const char *end, const char **endPtr) { const char *next; int tok = XmlContentTok(encoding, start, end, &next); switch (tok) { case XML_TOK_BOM: start = next; break; case XML_TOK_PARTIAL: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_PARTIAL_CHAR; } processor = externalEntityInitProcessor3; return externalEntityInitProcessor3(parser, start, end, endPtr); } static enum XML_Error externalEntityInitProcessor3(XML_Parser parser, const char *start, const char *end, const char **endPtr) { const char *next; int tok = XmlContentTok(encoding, start, end, &next); switch (tok) { case XML_TOK_XML_DECL: { enum XML_Error result = processXmlDecl(parser, 1, start, next); if (result != XML_ERROR_NONE) return result; start = next; } break; case XML_TOK_PARTIAL: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (endPtr) { *endPtr = start; return XML_ERROR_NONE; } eventPtr = start; return XML_ERROR_PARTIAL_CHAR; } processor = externalEntityContentProcessor; tagLevel = 1; return doContent(parser, 1, encoding, start, end, endPtr); } static enum XML_Error externalEntityContentProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { return doContent(parser, 1, encoding, start, end, endPtr); } static enum XML_Error doContent(XML_Parser parser, int startTagLevel, const ENCODING *enc, const char *s, const char *end, const char **nextPtr) { const ENCODING *internalEnc = XmlGetInternalEncoding(); const char *dummy; const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; *eventPP = s; eventEndPP = &eventEndPtr; } else eventPP = eventEndPP = &dummy; for (;;) { const char *next; int tok = XmlContentTok(enc, s, end, &next); *eventEndPP = next; switch (tok) { case XML_TOK_TRAILING_CR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } *eventEndPP = end; if (characterDataHandler) { XML_Char c = XML_T('\n'); characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, end); if (startTagLevel == 0) return XML_ERROR_NO_ELEMENTS; if (tagLevel != startTagLevel) return XML_ERROR_ASYNC_ENTITY; return XML_ERROR_NONE; case XML_TOK_NONE: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } if (startTagLevel > 0) { if (tagLevel != startTagLevel) return XML_ERROR_ASYNC_ENTITY; return XML_ERROR_NONE; } return XML_ERROR_NO_ELEMENTS; case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; case XML_TOK_ENTITY_REF: { const XML_Char *name; ENTITY *entity; XML_Char ch = XmlPredefinedEntityName(enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (ch) { if (characterDataHandler) characterDataHandler(handlerArg, &ch, 1); else if (defaultHandler) reportDefault(parser, enc, s, next); break; } name = poolStoreString(&dtd.pool, enc, s + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0); poolDiscard(&dtd.pool); if (!entity) { if (dtd.complete || dtd.standalone) return XML_ERROR_UNDEFINED_ENTITY; if (defaultHandler) reportDefault(parser, enc, s, next); break; } if (entity->open) return XML_ERROR_RECURSIVE_ENTITY_REF; if (entity->notation) return XML_ERROR_BINARY_ENTITY_REF; if (entity) { if (entity->textPtr) { enum XML_Error result; if (defaultHandler) { reportDefault(parser, enc, s, next); break; } /* Protect against the possibility that somebody sets the defaultHandler from inside another handler. */ *eventEndPP = *eventPP; entity->open = 1; result = doContent(parser, tagLevel, internalEnc, (char *)entity->textPtr, (char *)(entity->textPtr + entity->textLen), 0); entity->open = 0; if (result) return result; } else if (externalEntityRefHandler) { const XML_Char *openEntityNames; entity->open = 1; openEntityNames = getOpenEntityNames(parser); entity->open = 0; if (!openEntityNames) return XML_ERROR_NO_MEMORY; if (!externalEntityRefHandler(parser, openEntityNames, dtd.base, entity->systemId, entity->publicId)) return XML_ERROR_EXTERNAL_ENTITY_HANDLING; poolDiscard(&tempPool); } else if (defaultHandler) reportDefault(parser, enc, s, next); } break; } case XML_TOK_START_TAG_WITH_ATTS: if (!startElementHandler) { enum XML_Error result = storeAtts(parser, enc, 0, s); if (result) return result; } /* fall through */ case XML_TOK_START_TAG_NO_ATTS: { TAG *tag; if (freeTagList) { tag = freeTagList; freeTagList = freeTagList->parent; } else { tag = malloc(sizeof(TAG)); if (!tag) return XML_ERROR_NO_MEMORY; tag->buf = malloc(INIT_TAG_BUF_SIZE); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + INIT_TAG_BUF_SIZE; } tag->parent = tagStack; tagStack = tag; tag->rawName = s + enc->minBytesPerChar; tag->rawNameLength = XmlNameLength(enc, tag->rawName); if (nextPtr) { if (tag->rawNameLength > tag->bufEnd - tag->buf) { int bufSize = tag->rawNameLength * 4; bufSize = ROUND_UP(bufSize, sizeof(XML_Char)); tag->buf = realloc(tag->buf, bufSize); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + bufSize; } memcpy(tag->buf, tag->rawName, tag->rawNameLength); tag->rawName = tag->buf; } ++tagLevel; if (startElementHandler) { enum XML_Error result; XML_Char *toPtr; for (;;) { const char *rawNameEnd = tag->rawName + tag->rawNameLength; const char *fromPtr = tag->rawName; int bufSize; if (nextPtr) toPtr = (XML_Char *)(tag->buf + ROUND_UP(tag->rawNameLength, sizeof(XML_Char))); else toPtr = (XML_Char *)tag->buf; tag->name = toPtr; XmlConvert(enc, &fromPtr, rawNameEnd, (ICHAR **)&toPtr, (ICHAR *)tag->bufEnd - 1); if (fromPtr == rawNameEnd) break; bufSize = (tag->bufEnd - tag->buf) << 1; tag->buf = realloc(tag->buf, bufSize); if (!tag->buf) return XML_ERROR_NO_MEMORY; tag->bufEnd = tag->buf + bufSize; if (nextPtr) tag->rawName = tag->buf; } *toPtr = XML_T('\0'); result = storeAtts(parser, enc, tag->name, s); if (result) return result; startElementHandler(handlerArg, tag->name, (const XML_Char **)atts); poolClear(&tempPool); } else { tag->name = 0; if (defaultHandler) reportDefault(parser, enc, s, next); } break; } case XML_TOK_EMPTY_ELEMENT_WITH_ATTS: if (!startElementHandler) { enum XML_Error result = storeAtts(parser, enc, 0, s); if (result) return result; } /* fall through */ case XML_TOK_EMPTY_ELEMENT_NO_ATTS: if (startElementHandler || endElementHandler) { const char *rawName = s + enc->minBytesPerChar; const XML_Char *name = poolStoreString(&tempPool, enc, rawName, rawName + XmlNameLength(enc, rawName)); if (!name) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); if (startElementHandler) { enum XML_Error result = storeAtts(parser, enc, name, s); if (result) return result; startElementHandler(handlerArg, name, (const XML_Char **)atts); } if (endElementHandler) { if (startElementHandler) *eventPP = *eventEndPP; endElementHandler(handlerArg, name); } poolClear(&tempPool); } else if (defaultHandler) reportDefault(parser, enc, s, next); if (tagLevel == 0) return epilogProcessor(parser, next, end, nextPtr); break; case XML_TOK_END_TAG: if (tagLevel == startTagLevel) return XML_ERROR_ASYNC_ENTITY; else { int len; const char *rawName; TAG *tag = tagStack; tagStack = tag->parent; tag->parent = freeTagList; freeTagList = tag; rawName = s + enc->minBytesPerChar*2; len = XmlNameLength(enc, rawName); if (len != tag->rawNameLength || memcmp(tag->rawName, rawName, len) != 0) { *eventPP = rawName; return XML_ERROR_TAG_MISMATCH; } --tagLevel; if (endElementHandler) { if (tag->name) endElementHandler(handlerArg, tag->name); else { const XML_Char *name = poolStoreString(&tempPool, enc, rawName, rawName + len); if (!name) return XML_ERROR_NO_MEMORY; endElementHandler(handlerArg, name); poolClear(&tempPool); } } else if (defaultHandler) reportDefault(parser, enc, s, next); if (tagLevel == 0) return epilogProcessor(parser, next, end, nextPtr); } break; case XML_TOK_CHAR_REF: { int n = XmlCharRefNumber(enc, s); if (n < 0) return XML_ERROR_BAD_CHAR_REF; if (characterDataHandler) { XML_Char buf[XML_ENCODE_MAX]; characterDataHandler(handlerArg, buf, XmlEncode(n, (ICHAR *)buf)); } else if (defaultHandler) reportDefault(parser, enc, s, next); } break; case XML_TOK_XML_DECL: return XML_ERROR_MISPLACED_XML_PI; case XML_TOK_DATA_NEWLINE: if (characterDataHandler) { XML_Char c = XML_T('\n'); characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_CDATA_SECT_OPEN: { enum XML_Error result; if (characterDataHandler) characterDataHandler(handlerArg, dataBuf, 0); else if (defaultHandler) reportDefault(parser, enc, s, next); result = doCdataSection(parser, enc, &next, end, nextPtr); if (!next) { processor = cdataSectionProcessor; return result; } } break; case XML_TOK_TRAILING_RSQB: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd); characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)end - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, end); if (startTagLevel == 0) { *eventPP = end; return XML_ERROR_NO_ELEMENTS; } if (tagLevel != startTagLevel) { *eventPP = end; return XML_ERROR_ASYNC_ENTITY; } return XML_ERROR_NONE; case XML_TOK_DATA_CHARS: if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { for (;;) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd); *eventEndPP = s; characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); if (s == next) break; *eventPP = s; } } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)next - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_PI: if (!reportProcessingInstruction(parser, enc, s, next)) return XML_ERROR_NO_MEMORY; break; default: if (defaultHandler) reportDefault(parser, enc, s, next); break; } *eventPP = s = next; } /* not reached */ } /* If tagName is non-null, build a real list of attributes, otherwise just check the attributes for well-formedness. */ static enum XML_Error storeAtts(XML_Parser parser, const ENCODING *enc, const XML_Char *tagName, const char *s) { ELEMENT_TYPE *elementType = 0; int nDefaultAtts = 0; const XML_Char **appAtts; int i; int n; if (tagName) { elementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, tagName, 0); if (elementType) nDefaultAtts = elementType->nDefaultAtts; } n = XmlGetAttributes(enc, s, attsSize, atts); if (n + nDefaultAtts > attsSize) { int oldAttsSize = attsSize; attsSize = n + nDefaultAtts + INIT_ATTS_SIZE; atts = realloc((void *)atts, attsSize * sizeof(ATTRIBUTE)); if (!atts) return XML_ERROR_NO_MEMORY; if (n > oldAttsSize) XmlGetAttributes(enc, s, n, atts); } appAtts = (const XML_Char **)atts; for (i = 0; i < n; i++) { ATTRIBUTE_ID *attId = getAttributeId(parser, enc, atts[i].name, atts[i].name + XmlNameLength(enc, atts[i].name)); if (!attId) return XML_ERROR_NO_MEMORY; if ((attId->name)[-1]) { if (enc == encoding) eventPtr = atts[i].name; return XML_ERROR_DUPLICATE_ATTRIBUTE; } (attId->name)[-1] = 1; appAtts[i << 1] = attId->name; if (!atts[i].normalized) { enum XML_Error result; int isCdata = 1; if (attId->maybeTokenized) { int j; for (j = 0; j < nDefaultAtts; j++) { if (attId == elementType->defaultAtts[j].id) { isCdata = elementType->defaultAtts[j].isCdata; break; } } } result = storeAttributeValue(parser, enc, isCdata, atts[i].valuePtr, atts[i].valueEnd, &tempPool); if (result) return result; if (tagName) { appAtts[(i << 1) + 1] = poolStart(&tempPool); poolFinish(&tempPool); } else poolDiscard(&tempPool); } else if (tagName) { appAtts[(i << 1) + 1] = poolStoreString(&tempPool, enc, atts[i].valuePtr, atts[i].valueEnd); if (appAtts[(i << 1) + 1] == 0) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); } } if (tagName) { int j; for (j = 0; j < nDefaultAtts; j++) { const DEFAULT_ATTRIBUTE *da = elementType->defaultAtts + j; if (!(da->id->name)[-1] && da->value) { (da->id->name)[-1] = 1; appAtts[i << 1] = da->id->name; appAtts[(i << 1) + 1] = da->value; i++; } } appAtts[i << 1] = 0; } while (i-- > 0) ((XML_Char *)appAtts[i << 1])[-1] = 0; return XML_ERROR_NONE; } /* The idea here is to avoid using stack for each CDATA section when the whole file is parsed with one call. */ static enum XML_Error cdataSectionProcessor(XML_Parser parser, const char *start, const char *end, const char **endPtr) { enum XML_Error result = doCdataSection(parser, encoding, &start, end, endPtr); if (start) { processor = contentProcessor; return contentProcessor(parser, start, end, endPtr); } return result; } /* startPtr gets set to non-null is the section is closed, and to null if the section is not yet closed. */ static enum XML_Error doCdataSection(XML_Parser parser, const ENCODING *enc, const char **startPtr, const char *end, const char **nextPtr) { const char *s = *startPtr; const char *dummy; const char **eventPP; const char **eventEndPP; if (enc == encoding) { eventPP = &eventPtr; *eventPP = s; eventEndPP = &eventEndPtr; } else eventPP = eventEndPP = &dummy; *startPtr = 0; for (;;) { const char *next; int tok = XmlCdataSectionTok(enc, s, end, &next); *eventEndPP = next; switch (tok) { case XML_TOK_CDATA_SECT_CLOSE: if (characterDataHandler) characterDataHandler(handlerArg, dataBuf, 0); else if (defaultHandler) reportDefault(parser, enc, s, next); *startPtr = next; return XML_ERROR_NONE; case XML_TOK_DATA_NEWLINE: if (characterDataHandler) { XML_Char c = XML_T('\n'); characterDataHandler(handlerArg, &c, 1); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_DATA_CHARS: if (characterDataHandler) { if (MUST_CONVERT(enc, s)) { for (;;) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, next, &dataPtr, (ICHAR *)dataBufEnd); *eventEndPP = next; characterDataHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); if (s == next) break; *eventPP = s; } } else characterDataHandler(handlerArg, (XML_Char *)s, (XML_Char *)next - (XML_Char *)s); } else if (defaultHandler) reportDefault(parser, enc, s, next); break; case XML_TOK_INVALID: *eventPP = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; case XML_TOK_PARTIAL: case XML_TOK_NONE: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_CDATA_SECTION; default: abort(); } *eventPP = s = next; } /* not reached */ } static enum XML_Error initializeEncoding(XML_Parser parser) { const char *s; #ifdef XML_UNICODE char encodingBuf[128]; if (!protocolEncodingName) s = 0; else { int i; for (i = 0; protocolEncodingName[i]; i++) { if (i == sizeof(encodingBuf) - 1 || protocolEncodingName[i] >= 0x80 || protocolEncodingName[i] < 0) { encodingBuf[0] = '\0'; break; } encodingBuf[i] = (char)protocolEncodingName[i]; } encodingBuf[i] = '\0'; s = encodingBuf; } #else s = protocolEncodingName; #endif if (XmlInitEncoding(&initEncoding, &encoding, s)) return XML_ERROR_NONE; return handleUnknownEncoding(parser, protocolEncodingName); } static enum XML_Error processXmlDecl(XML_Parser parser, int isGeneralTextEntity, const char *s, const char *next) { const char *encodingName = 0; const ENCODING *newEncoding = 0; const char *version; int standalone = -1; if (!XmlParseXmlDecl(isGeneralTextEntity, encoding, s, next, &eventPtr, &version, &encodingName, &newEncoding, &standalone)) return XML_ERROR_SYNTAX; if (!isGeneralTextEntity && standalone == 1) dtd.standalone = 1; if (defaultHandler) reportDefault(parser, encoding, s, next); if (!protocolEncodingName) { if (newEncoding) { if (newEncoding->minBytesPerChar != encoding->minBytesPerChar) { eventPtr = encodingName; return XML_ERROR_INCORRECT_ENCODING; } encoding = newEncoding; } else if (encodingName) { enum XML_Error result; const XML_Char *s = poolStoreString(&tempPool, encoding, encodingName, encodingName + XmlNameLength(encoding, encodingName)); if (!s) return XML_ERROR_NO_MEMORY; result = handleUnknownEncoding(parser, s); poolDiscard(&tempPool); if (result == XML_ERROR_UNKNOWN_ENCODING) eventPtr = encodingName; return result; } } return XML_ERROR_NONE; } static enum XML_Error handleUnknownEncoding(XML_Parser parser, const XML_Char *encodingName) { if (unknownEncodingHandler) { XML_Encoding info; int i; for (i = 0; i < 256; i++) info.map[i] = -1; info.convert = 0; info.data = 0; info.release = 0; if (unknownEncodingHandler(unknownEncodingHandlerData, encodingName, &info)) { ENCODING *enc; unknownEncodingMem = malloc(XmlSizeOfUnknownEncoding()); if (!unknownEncodingMem) { if (info.release) info.release(info.data); return XML_ERROR_NO_MEMORY; } enc = XmlInitUnknownEncoding(unknownEncodingMem, info.map, info.convert, info.data); if (enc) { unknownEncodingData = info.data; unknownEncodingRelease = info.release; encoding = enc; return XML_ERROR_NONE; } } if (info.release) info.release(info.data); } return XML_ERROR_UNKNOWN_ENCODING; } static enum XML_Error prologInitProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { enum XML_Error result = initializeEncoding(parser); if (result != XML_ERROR_NONE) return result; processor = prologProcessor; return prologProcessor(parser, s, end, nextPtr); } static enum XML_Error prologProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { for (;;) { const char *next; int tok = XmlPrologTok(encoding, s, end, &next); if (tok <= 0) { if (nextPtr != 0 && tok != XML_TOK_INVALID) { *nextPtr = s; return XML_ERROR_NONE; } switch (tok) { case XML_TOK_INVALID: eventPtr = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_NONE: return XML_ERROR_NO_ELEMENTS; case XML_TOK_PARTIAL: return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: return XML_ERROR_PARTIAL_CHAR; case XML_TOK_TRAILING_CR: eventPtr = s + encoding->minBytesPerChar; return XML_ERROR_NO_ELEMENTS; default: abort(); } } switch (XmlTokenRole(&prologState, tok, s, next, encoding)) { case XML_ROLE_XML_DECL: { enum XML_Error result = processXmlDecl(parser, 0, s, next); if (result != XML_ERROR_NONE) return result; } break; case XML_ROLE_DOCTYPE_SYSTEM_ID: hadExternalDoctype = 1; break; case XML_ROLE_DOCTYPE_PUBLIC_ID: case XML_ROLE_ENTITY_PUBLIC_ID: if (!XmlIsPublicId(encoding, s, next, &eventPtr)) return XML_ERROR_SYNTAX; if (declEntity) { XML_Char *tem = poolStoreString(&dtd.pool, encoding, s + encoding->minBytesPerChar, next - encoding->minBytesPerChar); if (!tem) return XML_ERROR_NO_MEMORY; normalizePublicId(tem); declEntity->publicId = tem; poolFinish(&dtd.pool); } break; case XML_ROLE_INSTANCE_START: processor = contentProcessor; if (hadExternalDoctype) dtd.complete = 0; return contentProcessor(parser, s, end, nextPtr); case XML_ROLE_ATTLIST_ELEMENT_NAME: { const XML_Char *name = poolStoreString(&dtd.pool, encoding, s, next); if (!name) return XML_ERROR_NO_MEMORY; declElementType = (ELEMENT_TYPE *)lookup(&dtd.elementTypes, name, sizeof(ELEMENT_TYPE)); if (!declElementType) return XML_ERROR_NO_MEMORY; if (declElementType->name != name) poolDiscard(&dtd.pool); else poolFinish(&dtd.pool); break; } case XML_ROLE_ATTRIBUTE_NAME: declAttributeId = getAttributeId(parser, encoding, s, next); if (!declAttributeId) return XML_ERROR_NO_MEMORY; declAttributeIsCdata = 0; break; case XML_ROLE_ATTRIBUTE_TYPE_CDATA: declAttributeIsCdata = 1; break; case XML_ROLE_IMPLIED_ATTRIBUTE_VALUE: case XML_ROLE_REQUIRED_ATTRIBUTE_VALUE: if (dtd.complete && !defineAttribute(declElementType, declAttributeId, declAttributeIsCdata, 0)) return XML_ERROR_NO_MEMORY; break; case XML_ROLE_DEFAULT_ATTRIBUTE_VALUE: case XML_ROLE_FIXED_ATTRIBUTE_VALUE: { const XML_Char *attVal; enum XML_Error result = storeAttributeValue(parser, encoding, declAttributeIsCdata, s + encoding->minBytesPerChar, next - encoding->minBytesPerChar, &dtd.pool); if (result) return result; attVal = poolStart(&dtd.pool); poolFinish(&dtd.pool); if (dtd.complete && !defineAttribute(declElementType, declAttributeId, declAttributeIsCdata, attVal)) return XML_ERROR_NO_MEMORY; break; } case XML_ROLE_ENTITY_VALUE: { enum XML_Error result = storeEntityValue(parser, s, next); if (result != XML_ERROR_NONE) return result; } break; case XML_ROLE_ENTITY_SYSTEM_ID: if (declEntity) { declEntity->systemId = poolStoreString(&dtd.pool, encoding, s + encoding->minBytesPerChar, next - encoding->minBytesPerChar); if (!declEntity->systemId) return XML_ERROR_NO_MEMORY; declEntity->base = dtd.base; poolFinish(&dtd.pool); } break; case XML_ROLE_ENTITY_NOTATION_NAME: if (declEntity) { declEntity->notation = poolStoreString(&dtd.pool, encoding, s, next); if (!declEntity->notation) return XML_ERROR_NO_MEMORY; poolFinish(&dtd.pool); if (unparsedEntityDeclHandler) { eventPtr = eventEndPtr = s; unparsedEntityDeclHandler(handlerArg, declEntity->name, declEntity->base, declEntity->systemId, declEntity->publicId, declEntity->notation); } } break; case XML_ROLE_GENERAL_ENTITY_NAME: { const XML_Char *name; if (XmlPredefinedEntityName(encoding, s, next)) { declEntity = 0; break; } name = poolStoreString(&dtd.pool, encoding, s, next); if (!name) return XML_ERROR_NO_MEMORY; if (dtd.complete) { declEntity = (ENTITY *)lookup(&dtd.generalEntities, name, sizeof(ENTITY)); if (!declEntity) return XML_ERROR_NO_MEMORY; if (declEntity->name != name) { poolDiscard(&dtd.pool); declEntity = 0; } else poolFinish(&dtd.pool); } else { poolDiscard(&dtd.pool); declEntity = 0; } } break; case XML_ROLE_PARAM_ENTITY_NAME: declEntity = 0; break; case XML_ROLE_NOTATION_NAME: declNotationPublicId = 0; declNotationName = 0; if (notationDeclHandler) { declNotationName = poolStoreString(&tempPool, encoding, s, next); if (!declNotationName) return XML_ERROR_NO_MEMORY; poolFinish(&tempPool); } break; case XML_ROLE_NOTATION_PUBLIC_ID: if (!XmlIsPublicId(encoding, s, next, &eventPtr)) return XML_ERROR_SYNTAX; if (declNotationName) { XML_Char *tem = poolStoreString(&tempPool, encoding, s + encoding->minBytesPerChar, next - encoding->minBytesPerChar); if (!tem) return XML_ERROR_NO_MEMORY; normalizePublicId(tem); declNotationPublicId = tem; poolFinish(&tempPool); } break; case XML_ROLE_NOTATION_SYSTEM_ID: if (declNotationName && notationDeclHandler) { const XML_Char *systemId = poolStoreString(&tempPool, encoding, s + encoding->minBytesPerChar, next - encoding->minBytesPerChar); if (!systemId) return XML_ERROR_NO_MEMORY; eventPtr = eventEndPtr = s; notationDeclHandler(handlerArg, declNotationName, dtd.base, systemId, declNotationPublicId); } poolClear(&tempPool); break; case XML_ROLE_NOTATION_NO_SYSTEM_ID: if (declNotationPublicId && notationDeclHandler) { eventPtr = eventEndPtr = s; notationDeclHandler(handlerArg, declNotationName, dtd.base, 0, declNotationPublicId); } poolClear(&tempPool); break; case XML_ROLE_ERROR: eventPtr = s; switch (tok) { case XML_TOK_PARAM_ENTITY_REF: return XML_ERROR_PARAM_ENTITY_REF; case XML_TOK_XML_DECL: return XML_ERROR_MISPLACED_XML_PI; default: return XML_ERROR_SYNTAX; } case XML_ROLE_GROUP_OPEN: if (prologState.level >= groupSize) { if (groupSize) groupConnector = realloc(groupConnector, groupSize *= 2); else groupConnector = malloc(groupSize = 32); if (!groupConnector) return XML_ERROR_NO_MEMORY; } groupConnector[prologState.level] = 0; break; case XML_ROLE_GROUP_SEQUENCE: if (groupConnector[prologState.level] == '|') { eventPtr = s; return XML_ERROR_SYNTAX; } groupConnector[prologState.level] = ','; break; case XML_ROLE_GROUP_CHOICE: if (groupConnector[prologState.level] == ',') { eventPtr = s; return XML_ERROR_SYNTAX; } groupConnector[prologState.level] = '|'; break; case XML_ROLE_PARAM_ENTITY_REF: dtd.complete = 0; break; case XML_ROLE_NONE: switch (tok) { case XML_TOK_PI: eventPtr = s; eventEndPtr = next; if (!reportProcessingInstruction(parser, encoding, s, next)) return XML_ERROR_NO_MEMORY; break; } break; } if (defaultHandler) { switch (tok) { case XML_TOK_PI: case XML_TOK_BOM: case XML_TOK_XML_DECL: break; default: eventPtr = s; eventEndPtr = next; reportDefault(parser, encoding, s, next); } } s = next; } /* not reached */ } static enum XML_Error epilogProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { processor = epilogProcessor; eventPtr = s; for (;;) { const char *next; int tok = XmlPrologTok(encoding, s, end, &next); eventEndPtr = next; switch (tok) { case XML_TOK_TRAILING_CR: if (defaultHandler) { eventEndPtr = end; reportDefault(parser, encoding, s, end); } /* fall through */ case XML_TOK_NONE: if (nextPtr) *nextPtr = end; return XML_ERROR_NONE; case XML_TOK_PROLOG_S: case XML_TOK_COMMENT: if (defaultHandler) reportDefault(parser, encoding, s, next); break; case XML_TOK_PI: if (!reportProcessingInstruction(parser, encoding, s, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_INVALID: eventPtr = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_UNCLOSED_TOKEN; case XML_TOK_PARTIAL_CHAR: if (nextPtr) { *nextPtr = s; return XML_ERROR_NONE; } return XML_ERROR_PARTIAL_CHAR; default: return XML_ERROR_JUNK_AFTER_DOC_ELEMENT; } eventPtr = s = next; } } static enum XML_Error errorProcessor(XML_Parser parser, const char *s, const char *end, const char **nextPtr) { return errorCode; } static enum XML_Error storeAttributeValue(XML_Parser parser, const ENCODING *enc, int isCdata, const char *ptr, const char *end, STRING_POOL *pool) { enum XML_Error result = appendAttributeValue(parser, enc, isCdata, ptr, end, pool); if (result) return result; if (!isCdata && poolLength(pool) && poolLastChar(pool) == XML_T(' ')) poolChop(pool); if (!poolAppendChar(pool, XML_T('\0'))) return XML_ERROR_NO_MEMORY; return XML_ERROR_NONE; } static enum XML_Error appendAttributeValue(XML_Parser parser, const ENCODING *enc, int isCdata, const char *ptr, const char *end, STRING_POOL *pool) { const ENCODING *internalEnc = XmlGetInternalEncoding(); for (;;) { const char *next; int tok = XmlAttributeValueTok(enc, ptr, end, &next); switch (tok) { case XML_TOK_NONE: return XML_ERROR_NONE; case XML_TOK_INVALID: if (enc == encoding) eventPtr = next; return XML_ERROR_INVALID_TOKEN; case XML_TOK_PARTIAL: if (enc == encoding) eventPtr = ptr; return XML_ERROR_INVALID_TOKEN; case XML_TOK_CHAR_REF: { XML_Char buf[XML_ENCODE_MAX]; int i; int n = XmlCharRefNumber(enc, ptr); if (n < 0) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BAD_CHAR_REF; } if (!isCdata && n == 0x20 /* space */ && (poolLength(pool) == 0 || poolLastChar(pool) == XML_T(' '))) break; n = XmlEncode(n, (ICHAR *)buf); if (!n) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BAD_CHAR_REF; } for (i = 0; i < n; i++) { if (!poolAppendChar(pool, buf[i])) return XML_ERROR_NO_MEMORY; } } break; case XML_TOK_DATA_CHARS: if (!poolAppend(pool, enc, ptr, next)) return XML_ERROR_NO_MEMORY; break; break; case XML_TOK_TRAILING_CR: next = ptr + enc->minBytesPerChar; /* fall through */ case XML_TOK_ATTRIBUTE_VALUE_S: case XML_TOK_DATA_NEWLINE: if (!isCdata && (poolLength(pool) == 0 || poolLastChar(pool) == XML_T(' '))) break; if (!poolAppendChar(pool, XML_T(' '))) return XML_ERROR_NO_MEMORY; break; case XML_TOK_ENTITY_REF: { const XML_Char *name; ENTITY *entity; XML_Char ch = XmlPredefinedEntityName(enc, ptr + enc->minBytesPerChar, next - enc->minBytesPerChar); if (ch) { if (!poolAppendChar(pool, ch)) return XML_ERROR_NO_MEMORY; break; } name = poolStoreString(&temp2Pool, enc, ptr + enc->minBytesPerChar, next - enc->minBytesPerChar); if (!name) return XML_ERROR_NO_MEMORY; entity = (ENTITY *)lookup(&dtd.generalEntities, name, 0); poolDiscard(&temp2Pool); if (!entity) { if (dtd.complete) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_UNDEFINED_ENTITY; } } else if (entity->open) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_RECURSIVE_ENTITY_REF; } else if (entity->notation) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_BINARY_ENTITY_REF; } else if (!entity->textPtr) { if (enc == encoding) eventPtr = ptr; return XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF; } else { enum XML_Error result; const XML_Char *textEnd = entity->textPtr + entity->textLen; entity->open = 1; result = appendAttributeValue(parser, internalEnc, isCdata, (char *)entity->textPtr, (char *)textEnd, pool); entity->open = 0; if (result) return result; } } break; default: abort(); } ptr = next; } /* not reached */ } static enum XML_Error storeEntityValue(XML_Parser parser, const char *entityTextPtr, const char *entityTextEnd) { const ENCODING *internalEnc = XmlGetInternalEncoding(); STRING_POOL *pool = &(dtd.pool); entityTextPtr += encoding->minBytesPerChar; entityTextEnd -= encoding->minBytesPerChar; for (;;) { const char *next; int tok = XmlEntityValueTok(encoding, entityTextPtr, entityTextEnd, &next); switch (tok) { case XML_TOK_PARAM_ENTITY_REF: eventPtr = entityTextPtr; return XML_ERROR_SYNTAX; case XML_TOK_NONE: if (declEntity) { declEntity->textPtr = pool->start; declEntity->textLen = pool->ptr - pool->start; poolFinish(pool); } else poolDiscard(pool); return XML_ERROR_NONE; case XML_TOK_ENTITY_REF: case XML_TOK_DATA_CHARS: if (!poolAppend(pool, encoding, entityTextPtr, next)) return XML_ERROR_NO_MEMORY; break; case XML_TOK_TRAILING_CR: next = entityTextPtr + encoding->minBytesPerChar; /* fall through */ case XML_TOK_DATA_NEWLINE: if (pool->end == pool->ptr && !poolGrow(pool)) return XML_ERROR_NO_MEMORY; *(pool->ptr)++ = XML_T('\n'); break; case XML_TOK_CHAR_REF: { XML_Char buf[XML_ENCODE_MAX]; int i; int n = XmlCharRefNumber(encoding, entityTextPtr); if (n < 0) { eventPtr = entityTextPtr; return XML_ERROR_BAD_CHAR_REF; } n = XmlEncode(n, (ICHAR *)buf); if (!n) { eventPtr = entityTextPtr; return XML_ERROR_BAD_CHAR_REF; } for (i = 0; i < n; i++) { if (pool->end == pool->ptr && !poolGrow(pool)) return XML_ERROR_NO_MEMORY; *(pool->ptr)++ = buf[i]; } } break; case XML_TOK_PARTIAL: eventPtr = entityTextPtr; return XML_ERROR_INVALID_TOKEN; case XML_TOK_INVALID: eventPtr = next; return XML_ERROR_INVALID_TOKEN; default: abort(); } entityTextPtr = next; } /* not reached */ } static void normalizeLines(XML_Char *s) { XML_Char *p; for (;; s++) { if (*s == XML_T('\0')) return; if (*s == XML_T('\r')) break; } p = s; do { if (*s == XML_T('\r')) { *p++ = XML_T('\n'); if (*++s == XML_T('\n')) s++; } else *p++ = *s++; } while (*s); *p = XML_T('\0'); } static int reportProcessingInstruction(XML_Parser parser, const ENCODING *enc, const char *start, const char *end) { const XML_Char *target; XML_Char *data; const char *tem; if (!processingInstructionHandler) { if (defaultHandler) reportDefault(parser, enc, start, end); return 1; } start += enc->minBytesPerChar * 2; tem = start + XmlNameLength(enc, start); target = poolStoreString(&tempPool, enc, start, tem); if (!target) return 0; poolFinish(&tempPool); data = poolStoreString(&tempPool, enc, XmlSkipS(enc, tem), end - enc->minBytesPerChar*2); if (!data) return 0; normalizeLines(data); processingInstructionHandler(handlerArg, target, data); poolClear(&tempPool); return 1; } static void reportDefault(XML_Parser parser, const ENCODING *enc, const char *s, const char *end) { if (MUST_CONVERT(enc, s)) { for (;;) { ICHAR *dataPtr = (ICHAR *)dataBuf; XmlConvert(enc, &s, end, &dataPtr, (ICHAR *)dataBufEnd); if (s == end) { defaultHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); break; } if (enc == encoding) { eventEndPtr = s; defaultHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); eventPtr = s; } else defaultHandler(handlerArg, dataBuf, dataPtr - (ICHAR *)dataBuf); } } else defaultHandler(handlerArg, (XML_Char *)s, (XML_Char *)end - (XML_Char *)s); } static int defineAttribute(ELEMENT_TYPE *type, ATTRIBUTE_ID *attId, int isCdata, const XML_Char *value) { DEFAULT_ATTRIBUTE *att; if (type->nDefaultAtts == type->allocDefaultAtts) { if (type->allocDefaultAtts == 0) { type->allocDefaultAtts = 8; type->defaultAtts = malloc(type->allocDefaultAtts*sizeof(DEFAULT_ATTRIBUTE)); } else { type->allocDefaultAtts *= 2; type->defaultAtts = realloc(type->defaultAtts, type->allocDefaultAtts*sizeof(DEFAULT_ATTRIBUTE)); } if (!type->defaultAtts) return 0; } att = type->defaultAtts + type->nDefaultAtts; att->id = attId; att->value = value; att->isCdata = isCdata; if (!isCdata) attId->maybeTokenized = 1; type->nDefaultAtts += 1; return 1; } static ATTRIBUTE_ID * getAttributeId(XML_Parser parser, const ENCODING *enc, const char *start, const char *end) { ATTRIBUTE_ID *id; const XML_Char *name; if (!poolAppendChar(&dtd.pool, XML_T('\0'))) return 0; name = poolStoreString(&dtd.pool, enc, start, end); if (!name) return 0; ++name; id = (ATTRIBUTE_ID *)lookup(&dtd.attributeIds, name, sizeof(ATTRIBUTE_ID)); if (!id) return 0; if (id->name != name) poolDiscard(&dtd.pool); else poolFinish(&dtd.pool); return id; } static const XML_Char *getOpenEntityNames(XML_Parser parser) { HASH_TABLE_ITER iter; hashTableIterInit(&iter, &(dtd.generalEntities)); for (;;) { const XML_Char *s; ENTITY *e = (ENTITY *)hashTableIterNext(&iter); if (!e) break; if (!e->open) continue; if (poolLength(&tempPool) > 0 && !poolAppendChar(&tempPool, XML_T(' '))) return 0; for (s = e->name; *s; s++) if (!poolAppendChar(&tempPool, *s)) return 0; } if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; return tempPool.start; } static int setOpenEntityNames(XML_Parser parser, const XML_Char *openEntityNames) { const XML_Char *s = openEntityNames; while (*openEntityNames != XML_T('\0')) { if (*s == XML_T(' ') || *s == XML_T('\0')) { ENTITY *e; if (!poolAppendChar(&tempPool, XML_T('\0'))) return 0; e = (ENTITY *)lookup(&dtd.generalEntities, poolStart(&tempPool), 0); if (e) e->open = 1; if (*s == XML_T(' ')) s++; openEntityNames = s; poolDiscard(&tempPool); } else { if (!poolAppendChar(&tempPool, *s)) return 0; s++; } } return 1; } static void normalizePublicId(XML_Char *publicId) { XML_Char *p = publicId; XML_Char *s; for (s = publicId; *s; s++) { switch (*s) { case XML_T(' '): case XML_T('\r'): case XML_T('\n'): if (p != publicId && p[-1] != XML_T(' ')) *p++ = XML_T(' '); break; default: *p++ = *s; } } if (p != publicId && p[-1] == XML_T(' ')) --p; *p = XML_T('\0'); } static int dtdInit(DTD *p) { poolInit(&(p->pool)); hashTableInit(&(p->generalEntities)); hashTableInit(&(p->elementTypes)); hashTableInit(&(p->attributeIds)); p->complete = 1; p->standalone = 0; p->base = 0; return 1; } static void dtdDestroy(DTD *p) { HASH_TABLE_ITER iter; hashTableIterInit(&iter, &(p->elementTypes)); for (;;) { ELEMENT_TYPE *e = (ELEMENT_TYPE *)hashTableIterNext(&iter); if (!e) break; if (e->allocDefaultAtts != 0) free(e->defaultAtts); } hashTableDestroy(&(p->generalEntities)); hashTableDestroy(&(p->elementTypes)); hashTableDestroy(&(p->attributeIds)); poolDestroy(&(p->pool)); } /* Do a deep copy of the DTD. Return 0 for out of memory; non-zero otherwise. The new DTD has already been initialized. */ static int dtdCopy(DTD *newDtd, const DTD *oldDtd) { HASH_TABLE_ITER iter; if (oldDtd->base) { const XML_Char *tem = poolCopyString(&(newDtd->pool), oldDtd->base); if (!tem) return 0; newDtd->base = tem; } hashTableIterInit(&iter, &(oldDtd->attributeIds)); /* Copy the attribute id table. */ for (;;) { ATTRIBUTE_ID *newA; const XML_Char *name; const ATTRIBUTE_ID *oldA = (ATTRIBUTE_ID *)hashTableIterNext(&iter); if (!oldA) break; /* Remember to allocate the scratch byte before the name. */ if (!poolAppendChar(&(newDtd->pool), XML_T('\0'))) return 0; name = poolCopyString(&(newDtd->pool), oldA->name); if (!name) return 0; ++name; newA = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), name, sizeof(ATTRIBUTE_ID)); if (!newA) return 0; newA->maybeTokenized = oldA->maybeTokenized; } /* Copy the element type table. */ hashTableIterInit(&iter, &(oldDtd->elementTypes)); for (;;) { int i; ELEMENT_TYPE *newE; const XML_Char *name; const ELEMENT_TYPE *oldE = (ELEMENT_TYPE *)hashTableIterNext(&iter); if (!oldE) break; name = poolCopyString(&(newDtd->pool), oldE->name); if (!name) return 0; newE = (ELEMENT_TYPE *)lookup(&(newDtd->elementTypes), name, sizeof(ELEMENT_TYPE)); if (!newE) return 0; newE->defaultAtts = (DEFAULT_ATTRIBUTE *)malloc(oldE->nDefaultAtts * sizeof(DEFAULT_ATTRIBUTE)); if (!newE->defaultAtts) return 0; newE->allocDefaultAtts = newE->nDefaultAtts = oldE->nDefaultAtts; for (i = 0; i < newE->nDefaultAtts; i++) { newE->defaultAtts[i].id = (ATTRIBUTE_ID *)lookup(&(newDtd->attributeIds), oldE->defaultAtts[i].id->name, 0); newE->defaultAtts[i].isCdata = oldE->defaultAtts[i].isCdata; if (oldE->defaultAtts[i].value) { newE->defaultAtts[i].value = poolCopyString(&(newDtd->pool), oldE->defaultAtts[i].value); if (!newE->defaultAtts[i].value) return 0; } else newE->defaultAtts[i].value = 0; } } /* Copy the entity table. */ hashTableIterInit(&iter, &(oldDtd->generalEntities)); for (;;) { ENTITY *newE; const XML_Char *name; const ENTITY *oldE = (ENTITY *)hashTableIterNext(&iter); if (!oldE) break; name = poolCopyString(&(newDtd->pool), oldE->name); if (!name) return 0; newE = (ENTITY *)lookup(&(newDtd->generalEntities), name, sizeof(ENTITY)); if (!newE) return 0; if (oldE->systemId) { const XML_Char *tem = poolCopyString(&(newDtd->pool), oldE->systemId); if (!tem) return 0; newE->systemId = tem; if (oldE->base) { if (oldE->base == oldDtd->base) newE->base = newDtd->base; tem = poolCopyString(&(newDtd->pool), oldE->base); if (!tem) return 0; newE->base = tem; } } else { const XML_Char *tem = poolCopyStringN(&(newDtd->pool), oldE->textPtr, oldE->textLen); if (!tem) return 0; newE->textPtr = tem; newE->textLen = oldE->textLen; } if (oldE->notation) { const XML_Char *tem = poolCopyString(&(newDtd->pool), oldE->notation); if (!tem) return 0; newE->notation = tem; } } newDtd->complete = oldDtd->complete; newDtd->standalone = oldDtd->standalone; return 1; } static void poolInit(STRING_POOL *pool) { pool->blocks = 0; pool->freeBlocks = 0; pool->start = 0; pool->ptr = 0; pool->end = 0; } static void poolClear(STRING_POOL *pool) { if (!pool->freeBlocks) pool->freeBlocks = pool->blocks; else { BLOCK *p = pool->blocks; while (p) { BLOCK *tem = p->next; p->next = pool->freeBlocks; pool->freeBlocks = p; p = tem; } } pool->blocks = 0; pool->start = 0; pool->ptr = 0; pool->end = 0; } static void poolDestroy(STRING_POOL *pool) { BLOCK *p = pool->blocks; while (p) { BLOCK *tem = p->next; free(p); p = tem; } pool->blocks = 0; p = pool->freeBlocks; while (p) { BLOCK *tem = p->next; free(p); p = tem; } pool->freeBlocks = 0; pool->ptr = 0; pool->start = 0; pool->end = 0; } static XML_Char *poolAppend(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end) { if (!pool->ptr && !poolGrow(pool)) return 0; for (;;) { XmlConvert(enc, &ptr, end, (ICHAR **)&(pool->ptr), (ICHAR *)pool->end); if (ptr == end) break; if (!poolGrow(pool)) return 0; } return pool->start; } static const XML_Char *poolCopyString(STRING_POOL *pool, const XML_Char *s) { do { if (!poolAppendChar(pool, *s)) return 0; } while (*s++); s = pool->start; poolFinish(pool); return s; } static const XML_Char *poolCopyStringN(STRING_POOL *pool, const XML_Char *s, int n) { if (!pool->ptr && !poolGrow(pool)) return 0; for (; n > 0; --n, s++) { if (!poolAppendChar(pool, *s)) return 0; } s = pool->start; poolFinish(pool); return s; } static XML_Char *poolStoreString(STRING_POOL *pool, const ENCODING *enc, const char *ptr, const char *end) { if (!poolAppend(pool, enc, ptr, end)) return 0; if (pool->ptr == pool->end && !poolGrow(pool)) return 0; *(pool->ptr)++ = 0; return pool->start; } static int poolGrow(STRING_POOL *pool) { if (pool->freeBlocks) { if (pool->start == 0) { pool->blocks = pool->freeBlocks; pool->freeBlocks = pool->freeBlocks->next; pool->blocks->next = 0; pool->start = pool->blocks->s; pool->end = pool->start + pool->blocks->size; pool->ptr = pool->start; return 1; } if (pool->end - pool->start < pool->freeBlocks->size) { BLOCK *tem = pool->freeBlocks->next; pool->freeBlocks->next = pool->blocks; pool->blocks = pool->freeBlocks; pool->freeBlocks = tem; memcpy(pool->blocks->s, pool->start, (pool->end - pool->start) * sizeof(XML_Char)); pool->ptr = pool->blocks->s + (pool->ptr - pool->start); pool->start = pool->blocks->s; pool->end = pool->start + pool->blocks->size; return 1; } } if (pool->blocks && pool->start == pool->blocks->s) { int blockSize = (pool->end - pool->start)*2; pool->blocks = realloc(pool->blocks, offsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); if (!pool->blocks) return 0; pool->blocks->size = blockSize; pool->ptr = pool->blocks->s + (pool->ptr - pool->start); pool->start = pool->blocks->s; pool->end = pool->start + blockSize; } else { BLOCK *tem; int blockSize = pool->end - pool->start; if (blockSize < INIT_BLOCK_SIZE) blockSize = INIT_BLOCK_SIZE; else blockSize *= 2; tem = malloc(offsetof(BLOCK, s) + blockSize * sizeof(XML_Char)); if (!tem) return 0; tem->size = blockSize; tem->next = pool->blocks; pool->blocks = tem; memcpy(tem->s, pool->start, (pool->ptr - pool->start) * sizeof(XML_Char)); pool->ptr = tem->s + (pool->ptr - pool->start); pool->start = tem->s; pool->end = tem->s + blockSize; } return 1; } 1.1 apache-1.3/src/lib/expat/xmlparse.h Index: xmlparse.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #ifndef XmlParse_INCLUDED #define XmlParse_INCLUDED 1 #ifdef __cplusplus extern "C" { #endif #ifndef XMLPARSEAPI #define XMLPARSEAPI /* as nothing */ #endif typedef void *XML_Parser; #ifdef XML_UNICODE_WCHAR_T /* XML_UNICODE_WCHAR_T will work only if sizeof(wchar_t) == 2 and wchar_t uses Unicode. */ /* Information is UTF-16 encoded as wchar_ts */ #ifndef XML_UNICODE #define XML_UNICODE #endif #include <stddef.h> typedef wchar_t XML_Char; typedef wchar_t XML_LChar; #else /* not XML_UNICODE_WCHAR_T */ #ifdef XML_UNICODE /* Information is UTF-16 encoded as unsigned shorts */ typedef unsigned short XML_Char; typedef char XML_LChar; #else /* not XML_UNICODE */ /* Information is UTF-8 encoded. */ typedef char XML_Char; typedef char XML_LChar; #endif /* not XML_UNICODE */ #endif /* not XML_UNICODE_WCHAR_T */ /* Constructs a new parser; encoding is the encoding specified by the external protocol or null if there is none specified. */ XML_Parser XMLPARSEAPI XML_ParserCreate(const XML_Char *encoding); /* atts is array of name/value pairs, terminated by 0; names and values are 0 terminated. */ typedef void (*XML_StartElementHandler)(void *userData, const XML_Char *name, const XML_Char **atts); typedef void (*XML_EndElementHandler)(void *userData, const XML_Char *name); /* s is not 0 terminated. */ typedef void (*XML_CharacterDataHandler)(void *userData, const XML_Char *s, int len); /* target and data are 0 terminated */ typedef void (*XML_ProcessingInstructionHandler)(void *userData, const XML_Char *target, const XML_Char *data); /* This is called for any characters in the XML document for which there is no applicable handler. This includes both characters that are part of markup which is of a kind that is not reported (comments, markup declarations), or characters that are part of a construct which could be reported but for which no handler has been supplied. The characters are passed exactly as they were in the XML document except that they will be encoded in UTF-8. Line boundaries are not normalized. Note that a byte order mark character is not passed to the default handler. If a default handler is set, internal entity references are not expanded. There are no guarantees about how characters are divided between calls to the default handler: for example, a comment might be split between multiple calls. */ typedef void (*XML_DefaultHandler)(void *userData, const XML_Char *s, int len); /* This is called for a declaration of an unparsed (NDATA) entity. The base argument is whatever was set by XML_SetBase. The entityName, systemId and notationName arguments will never be null. The other arguments may be. */ typedef void (*XML_UnparsedEntityDeclHandler)(void *userData, const XML_Char *entityName, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId, const XML_Char *notationName); /* This is called for a declaration of notation. The base argument is whatever was set by XML_SetBase. The notationName will never be null. The other arguments can be. */ typedef void (*XML_NotationDeclHandler)(void *userData, const XML_Char *notationName, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId); /* This is called for a reference to an external parsed general entity. The referenced entity is not automatically parsed. The application can parse it immediately or later using XML_ExternalEntityParserCreate. The parser argument is the parser parsing the entity containing the reference; it can be passed as the parser argument to XML_ExternalEntityParserCreate. The systemId argument is the system identifier as specified in the entity declaration; it will not be null. The base argument is the system identifier that should be used as the base for resolving systemId if systemId was relative; this is set by XML_SetBase; it may be null. The publicId argument is the public identifier as specified in the entity declaration, or null if none was specified; the whitespace in the public identifier will have been normalized as required by the XML spec. The openEntityNames argument is a space-separated list of the names of the entities that are open for the parse of this entity (including the name of the referenced entity); this can be passed as the openEntityNames argument to XML_ExternalEntityParserCreate; openEntityNames is valid only until the handler returns, so if the referenced entity is to be parsed later, it must be copied. The handler should return 0 if processing should not continue because of a fatal error in the handling of the external entity. In this case the calling parser will return an XML_ERROR_EXTERNAL_ENTITY_HANDLING error. Note that unlike other handlers the first argument is the parser, not userData. */ typedef int (*XML_ExternalEntityRefHandler)(XML_Parser parser, const XML_Char *openEntityNames, const XML_Char *base, const XML_Char *systemId, const XML_Char *publicId); /* This structure is filled in by the XML_UnknownEncodingHandler to provide information to the parser about encodings that are unknown to the parser. The map[b] member gives information about byte sequences whose first byte is b. If map[b] is c where c is >= 0, then b by itself encodes the Unicode scalar value c. If map[b] is -1, then the byte sequence is malformed. If map[b] is -n, where n >= 2, then b is the first byte of an n-byte sequence that encodes a single Unicode scalar value. The data member will be passed as the first argument to the convert function. The convert function is used to convert multibyte sequences; s will point to a n-byte sequence where map[(unsigned char)*s] == -n. The convert function must return the Unicode scalar value represented by this byte sequence or -1 if the byte sequence is malformed. The convert function may be null if the encoding is a single-byte encoding, that is if map[b] >= -1 for all bytes b. When the parser is finished with the encoding, then if release is not null, it will call release passing it the data member; once release has been called, the convert function will not be called again. Expat places certain restrictions on the encodings that are supported using this mechanism. 1. Every ASCII character that can appear in a well-formed XML document, other than the characters [EMAIL PROTECTED] must be represented by a single byte, and that byte must be the same byte that represents that character in ASCII. 2. No character may require more than 4 bytes to encode. 3. All characters encoded must have Unicode scalar values <= 0xFFFF, (ie characters that would be encoded by surrogates in UTF-16 are not allowed). Note that this restriction doesn't apply to the built-in support for UTF-8 and UTF-16. 4. No Unicode character may be encoded by more than one distinct sequence of bytes. */ typedef struct { int map[256]; void *data; int (*convert)(void *data, const char *s); void (*release)(void *data); } XML_Encoding; /* This is called for an encoding that is unknown to the parser. The encodingHandlerData argument is that which was passed as the second argument to XML_SetUnknownEncodingHandler. The name argument gives the name of the encoding as specified in the encoding declaration. If the callback can provide information about the encoding, it must fill in the XML_Encoding structure, and return 1. Otherwise it must return 0. If info does not describe a suitable encoding, then the parser will return an XML_UNKNOWN_ENCODING error. */ typedef int (*XML_UnknownEncodingHandler)(void *encodingHandlerData, const XML_Char *name, XML_Encoding *info); void XMLPARSEAPI XML_SetElementHandler(XML_Parser parser, XML_StartElementHandler start, XML_EndElementHandler end); void XMLPARSEAPI XML_SetCharacterDataHandler(XML_Parser parser, XML_CharacterDataHandler handler); void XMLPARSEAPI XML_SetProcessingInstructionHandler(XML_Parser parser, XML_ProcessingInstructionHandler handler); void XMLPARSEAPI XML_SetDefaultHandler(XML_Parser parser, XML_DefaultHandler handler); void XMLPARSEAPI XML_SetUnparsedEntityDeclHandler(XML_Parser parser, XML_UnparsedEntityDeclHandler handler); void XMLPARSEAPI XML_SetNotationDeclHandler(XML_Parser parser, XML_NotationDeclHandler handler); void XMLPARSEAPI XML_SetExternalEntityRefHandler(XML_Parser parser, XML_ExternalEntityRefHandler handler); void XMLPARSEAPI XML_SetUnknownEncodingHandler(XML_Parser parser, XML_UnknownEncodingHandler handler, void *encodingHandlerData); /* This can be called within a handler for a start element, end element, processing instruction or character data. It causes the corresponding markup to be passed to the default handler. Within the expansion of an internal entity, nothing will be passed to the default handler, although this usually will not happen since setting a default handler inhibits expansion of internal entities. */ void XMLPARSEAPI XML_DefaultCurrent(XML_Parser parser); /* This value is passed as the userData argument to callbacks. */ void XMLPARSEAPI XML_SetUserData(XML_Parser parser, void *userData); /* Returns the last value set by XML_SetUserData or null. */ #define XML_GetUserData(parser) (*(void **)(parser)) /* If this function is called, then the parser will be passed as the first argument to callbacks instead of userData. The userData will still be accessible using XML_GetUserData. */ void XMLPARSEAPI XML_UseParserAsHandlerArg(XML_Parser parser); /* Sets the base to be used for resolving relative URIs in system identifiers in declarations. Resolving relative identifiers is left to the application: this value will be passed through as the base argument to the XML_ExternalEntityRefHandler, XML_NotationDeclHandler and XML_UnparsedEntityDeclHandler. The base argument will be copied. Returns zero if out of memory, non-zero otherwise. */ int XMLPARSEAPI XML_SetBase(XML_Parser parser, const XML_Char *base); const XML_Char XMLPARSEAPI * XML_GetBase(XML_Parser parser); /* Parses some input. Returns 0 if a fatal error is detected. The last call to XML_Parse must have isFinal true; len may be zero for this call (or any other). */ int XMLPARSEAPI XML_Parse(XML_Parser parser, const char *s, int len, int isFinal); void XMLPARSEAPI * XML_GetBuffer(XML_Parser parser, int len); int XMLPARSEAPI XML_ParseBuffer(XML_Parser parser, int len, int isFinal); /* Creates an XML_Parser object that can parse an external general entity; openEntityNames is a space-separated list of the names of the entities that are open for the parse of this entity (including the name of this one); encoding is the externally specified encoding, or null if there is no externally specified encoding. This can be called at any point after the first call to an ExternalEntityRefHandler so longer as the parser has not yet been freed. The new parser is completely independent and may safely be used in a separate thread. The handlers and userData are initialized from the parser argument. Returns 0 if out of memory. Otherwise returns a new XML_Parser object. */ XML_Parser XMLPARSEAPI XML_ExternalEntityParserCreate(XML_Parser parser, const XML_Char *openEntityNames, const XML_Char *encoding); enum XML_Error { XML_ERROR_NONE, XML_ERROR_NO_MEMORY, XML_ERROR_SYNTAX, XML_ERROR_NO_ELEMENTS, XML_ERROR_INVALID_TOKEN, XML_ERROR_UNCLOSED_TOKEN, XML_ERROR_PARTIAL_CHAR, XML_ERROR_TAG_MISMATCH, XML_ERROR_DUPLICATE_ATTRIBUTE, XML_ERROR_JUNK_AFTER_DOC_ELEMENT, XML_ERROR_PARAM_ENTITY_REF, XML_ERROR_UNDEFINED_ENTITY, XML_ERROR_RECURSIVE_ENTITY_REF, XML_ERROR_ASYNC_ENTITY, XML_ERROR_BAD_CHAR_REF, XML_ERROR_BINARY_ENTITY_REF, XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF, XML_ERROR_MISPLACED_XML_PI, XML_ERROR_UNKNOWN_ENCODING, XML_ERROR_INCORRECT_ENCODING, XML_ERROR_UNCLOSED_CDATA_SECTION, XML_ERROR_EXTERNAL_ENTITY_HANDLING }; /* If XML_Parse or XML_ParseBuffer have returned 0, then XML_GetErrorCode returns information about the error. */ enum XML_Error XMLPARSEAPI XML_GetErrorCode(XML_Parser parser); /* These functions return information about the current parse location. They may be called when XML_Parse or XML_ParseBuffer return 0; in this case the location is the location of the character at which the error was detected. They may also be called from any other callback called to report some parse event; in this the location is the location of the first of the sequence of characters that generated the event. */ int XMLPARSEAPI XML_GetCurrentLineNumber(XML_Parser parser); int XMLPARSEAPI XML_GetCurrentColumnNumber(XML_Parser parser); long XMLPARSEAPI XML_GetCurrentByteIndex(XML_Parser parser); /* For backwards compatibility with previous versions. */ #define XML_GetErrorLineNumber XML_GetCurrentLineNumber #define XML_GetErrorColumnNumber XML_GetCurrentColumnNumber #define XML_GetErrorByteIndex XML_GetCurrentByteIndex /* Frees memory used by the parser. */ void XMLPARSEAPI XML_ParserFree(XML_Parser parser); /* Returns a string describing the error. */ const XML_LChar XMLPARSEAPI *XML_ErrorString(int code); #ifdef __cplusplus } #endif #endif /* not XmlParse_INCLUDED */ 1.1 apache-1.3/src/lib/expat/xmlrole.c Index: xmlrole.c =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #include "xmlrole.h" /* Doesn't check: that ,| are not mixed in a model group content of literals */ #ifndef MIN_BYTES_PER_CHAR #define MIN_BYTES_PER_CHAR(enc) ((enc)->minBytesPerChar) #endif typedef int PROLOG_HANDLER(struct prolog_state *state, int tok, const char *ptr, const char *end, const ENCODING *enc); static PROLOG_HANDLER prolog0, prolog1, prolog2, doctype0, doctype1, doctype2, doctype3, doctype4, doctype5, internalSubset, entity0, entity1, entity2, entity3, entity4, entity5, entity6, entity7, entity8, entity9, notation0, notation1, notation2, notation3, notation4, attlist0, attlist1, attlist2, attlist3, attlist4, attlist5, attlist6, attlist7, attlist8, attlist9, element0, element1, element2, element3, element4, element5, element6, element7, declClose, error; static int syntaxError(PROLOG_STATE *); static int prolog0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: state->handler = prolog1; return XML_ROLE_NONE; case XML_TOK_XML_DECL: state->handler = prolog1; return XML_ROLE_XML_DECL; case XML_TOK_PI: state->handler = prolog1; return XML_ROLE_NONE; case XML_TOK_COMMENT: state->handler = prolog1; case XML_TOK_BOM: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (!XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), "DOCTYPE")) break; state->handler = doctype0; return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return syntaxError(state); } static int prolog1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PI: case XML_TOK_COMMENT: case XML_TOK_BOM: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (!XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), "DOCTYPE")) break; state->handler = doctype0; return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return syntaxError(state); } static int prolog2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PI: case XML_TOK_COMMENT: return XML_ROLE_NONE; case XML_TOK_INSTANCE_START: state->handler = error; return XML_ROLE_INSTANCE_START; } return syntaxError(state); } static int doctype0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = doctype1; return XML_ROLE_DOCTYPE_NAME; } return syntaxError(state); } static int doctype1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = internalSubset; return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, "SYSTEM")) { state->handler = doctype3; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, "PUBLIC")) { state->handler = doctype2; return XML_ROLE_NONE; } break; } return syntaxError(state); } static int doctype2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = doctype3; return XML_ROLE_DOCTYPE_PUBLIC_ID; } return syntaxError(state); } static int doctype3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = doctype4; return XML_ROLE_DOCTYPE_SYSTEM_ID; } return syntaxError(state); } static int doctype4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_BRACKET: state->handler = internalSubset; return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; } return syntaxError(state); } static int doctype5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = prolog2; return XML_ROLE_DOCTYPE_CLOSE; } return syntaxError(state); } static int internalSubset(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_OPEN: if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), "ENTITY")) { state->handler = entity0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), "ATTLIST")) { state->handler = attlist0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), "ELEMENT")) { state->handler = element0; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr + 2 * MIN_BYTES_PER_CHAR(enc), "NOTATION")) { state->handler = notation0; return XML_ROLE_NONE; } break; case XML_TOK_PI: case XML_TOK_COMMENT: return XML_ROLE_NONE; case XML_TOK_PARAM_ENTITY_REF: return XML_ROLE_PARAM_ENTITY_REF; case XML_TOK_CLOSE_BRACKET: state->handler = doctype5; return XML_ROLE_NONE; } return syntaxError(state); } static int entity0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_PERCENT: state->handler = entity1; return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = entity2; return XML_ROLE_GENERAL_ENTITY_NAME; } return syntaxError(state); } static int entity1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = entity7; return XML_ROLE_PARAM_ENTITY_NAME; } return syntaxError(state); } static int entity2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, "SYSTEM")) { state->handler = entity4; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, "PUBLIC")) { state->handler = entity3; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_VALUE; } return syntaxError(state); } static int entity3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity4; return XML_ROLE_ENTITY_PUBLIC_ID; } return syntaxError(state); } static int entity4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity5; return XML_ROLE_ENTITY_SYSTEM_ID; } return syntaxError(state); } static int entity5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = internalSubset; return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, "NDATA")) { state->handler = entity6; return XML_ROLE_NONE; } break; } return syntaxError(state); } static int entity6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = declClose; return XML_ROLE_ENTITY_NOTATION_NAME; } return syntaxError(state); } static int entity7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, "SYSTEM")) { state->handler = entity9; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, "PUBLIC")) { state->handler = entity8; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_VALUE; } return syntaxError(state); } static int entity8(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = entity9; return XML_ROLE_ENTITY_PUBLIC_ID; } return syntaxError(state); } static int entity9(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_ENTITY_SYSTEM_ID; } return syntaxError(state); } static int notation0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = notation1; return XML_ROLE_NOTATION_NAME; } return syntaxError(state); } static int notation1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, "SYSTEM")) { state->handler = notation3; return XML_ROLE_NONE; } if (XmlNameMatchesAscii(enc, ptr, "PUBLIC")) { state->handler = notation2; return XML_ROLE_NONE; } break; } return syntaxError(state); } static int notation2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = notation4; return XML_ROLE_NOTATION_PUBLIC_ID; } return syntaxError(state); } static int notation3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_NOTATION_SYSTEM_ID; } return syntaxError(state); } static int notation4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = declClose; return XML_ROLE_NOTATION_SYSTEM_ID; case XML_TOK_DECL_CLOSE: state->handler = internalSubset; return XML_ROLE_NOTATION_NO_SYSTEM_ID; } return syntaxError(state); } static int attlist0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = attlist1; return XML_ROLE_ATTLIST_ELEMENT_NAME; } return syntaxError(state); } static int attlist1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = internalSubset; return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = attlist2; return XML_ROLE_ATTRIBUTE_NAME; } return syntaxError(state); } static int attlist2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: { static const char *types[] = { "CDATA", "ID", "IDREF", "IDREFS", "ENTITY", "ENTITIES", "NMTOKEN", "NMTOKENS", }; int i; for (i = 0; i < (int)(sizeof(types)/sizeof(types[0])); i++) if (XmlNameMatchesAscii(enc, ptr, types[i])) { state->handler = attlist8; return XML_ROLE_ATTRIBUTE_TYPE_CDATA + i; } } if (XmlNameMatchesAscii(enc, ptr, "NOTATION")) { state->handler = attlist5; return XML_ROLE_NONE; } break; case XML_TOK_OPEN_PAREN: state->handler = attlist3; return XML_ROLE_NONE; } return syntaxError(state); } static int attlist3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NMTOKEN: case XML_TOK_NAME: state->handler = attlist4; return XML_ROLE_ATTRIBUTE_ENUM_VALUE; } return syntaxError(state); } static int attlist4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->handler = attlist8; return XML_ROLE_NONE; case XML_TOK_OR: state->handler = attlist3; return XML_ROLE_NONE; } return syntaxError(state); } static int attlist5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_PAREN: state->handler = attlist6; return XML_ROLE_NONE; } return syntaxError(state); } static int attlist6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = attlist7; return XML_ROLE_ATTRIBUTE_NOTATION_VALUE; } return syntaxError(state); } static int attlist7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->handler = attlist8; return XML_ROLE_NONE; case XML_TOK_OR: state->handler = attlist6; return XML_ROLE_NONE; } return syntaxError(state); } /* default value */ static int attlist8(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_POUND_NAME: if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), "IMPLIED")) { state->handler = attlist1; return XML_ROLE_IMPLIED_ATTRIBUTE_VALUE; } if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), "REQUIRED")) { state->handler = attlist1; return XML_ROLE_REQUIRED_ATTRIBUTE_VALUE; } if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), "FIXED")) { state->handler = attlist9; return XML_ROLE_NONE; } break; case XML_TOK_LITERAL: state->handler = attlist1; return XML_ROLE_DEFAULT_ATTRIBUTE_VALUE; } return syntaxError(state); } static int attlist9(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_LITERAL: state->handler = attlist1; return XML_ROLE_FIXED_ATTRIBUTE_VALUE; } return syntaxError(state); } static int element0(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = element1; return XML_ROLE_ELEMENT_NAME; } return syntaxError(state); } static int element1(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: if (XmlNameMatchesAscii(enc, ptr, "EMPTY")) { state->handler = declClose; return XML_ROLE_CONTENT_EMPTY; } if (XmlNameMatchesAscii(enc, ptr, "ANY")) { state->handler = declClose; return XML_ROLE_CONTENT_ANY; } break; case XML_TOK_OPEN_PAREN: state->handler = element2; state->level = 1; return XML_ROLE_GROUP_OPEN; } return syntaxError(state); } static int element2(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_POUND_NAME: if (XmlNameMatchesAscii(enc, ptr + MIN_BYTES_PER_CHAR(enc), "PCDATA")) { state->handler = element3; return XML_ROLE_CONTENT_PCDATA; } break; case XML_TOK_OPEN_PAREN: state->level = 2; state->handler = element6; return XML_ROLE_GROUP_OPEN; case XML_TOK_NAME: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT; case XML_TOK_NAME_QUESTION: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_OPT; case XML_TOK_NAME_ASTERISK: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_REP; case XML_TOK_NAME_PLUS: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_PLUS; } return syntaxError(state); } static int element3(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: case XML_TOK_CLOSE_PAREN_ASTERISK: state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_OR: state->handler = element4; return XML_ROLE_NONE; } return syntaxError(state); } static int element4(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_NAME: state->handler = element5; return XML_ROLE_CONTENT_ELEMENT; } return syntaxError(state); } static int element5(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN_ASTERISK: state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_OR: state->handler = element4; return XML_ROLE_NONE; } return syntaxError(state); } static int element6(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_OPEN_PAREN: state->level += 1; return XML_ROLE_GROUP_OPEN; case XML_TOK_NAME: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT; case XML_TOK_NAME_QUESTION: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_OPT; case XML_TOK_NAME_ASTERISK: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_REP; case XML_TOK_NAME_PLUS: state->handler = element7; return XML_ROLE_CONTENT_ELEMENT_PLUS; } return syntaxError(state); } static int element7(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_CLOSE_PAREN: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE; case XML_TOK_CLOSE_PAREN_ASTERISK: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_REP; case XML_TOK_CLOSE_PAREN_QUESTION: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_OPT; case XML_TOK_CLOSE_PAREN_PLUS: state->level -= 1; if (state->level == 0) state->handler = declClose; return XML_ROLE_GROUP_CLOSE_PLUS; case XML_TOK_COMMA: state->handler = element6; return XML_ROLE_GROUP_SEQUENCE; case XML_TOK_OR: state->handler = element6; return XML_ROLE_GROUP_CHOICE; } return syntaxError(state); } static int declClose(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_PROLOG_S: return XML_ROLE_NONE; case XML_TOK_DECL_CLOSE: state->handler = internalSubset; return XML_ROLE_NONE; } return syntaxError(state); } #if 0 static int ignore(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { switch (tok) { case XML_TOK_DECL_CLOSE: state->handler = internalSubset; return 0; default: return XML_ROLE_NONE; } return syntaxError(state); } #endif static int error(PROLOG_STATE *state, int tok, const char *ptr, const char *end, const ENCODING *enc) { return XML_ROLE_NONE; } static int syntaxError(PROLOG_STATE *state) { state->handler = error; return XML_ROLE_ERROR; } void XmlPrologStateInit(PROLOG_STATE *state) { state->handler = prolog0; } 1.1 apache-1.3/src/lib/expat/xmlrole.h Index: xmlrole.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #ifndef XmlRole_INCLUDED #define XmlRole_INCLUDED 1 #include "xmltok.h" #ifdef __cplusplus extern "C" { #endif enum { XML_ROLE_ERROR = -1, XML_ROLE_NONE = 0, XML_ROLE_XML_DECL, XML_ROLE_INSTANCE_START, XML_ROLE_DOCTYPE_NAME, XML_ROLE_DOCTYPE_SYSTEM_ID, XML_ROLE_DOCTYPE_PUBLIC_ID, XML_ROLE_DOCTYPE_CLOSE, XML_ROLE_GENERAL_ENTITY_NAME, XML_ROLE_PARAM_ENTITY_NAME, XML_ROLE_ENTITY_VALUE, XML_ROLE_ENTITY_SYSTEM_ID, XML_ROLE_ENTITY_PUBLIC_ID, XML_ROLE_ENTITY_NOTATION_NAME, XML_ROLE_NOTATION_NAME, XML_ROLE_NOTATION_SYSTEM_ID, XML_ROLE_NOTATION_NO_SYSTEM_ID, XML_ROLE_NOTATION_PUBLIC_ID, XML_ROLE_ATTRIBUTE_NAME, XML_ROLE_ATTRIBUTE_TYPE_CDATA, XML_ROLE_ATTRIBUTE_TYPE_ID, XML_ROLE_ATTRIBUTE_TYPE_IDREF, XML_ROLE_ATTRIBUTE_TYPE_IDREFS, XML_ROLE_ATTRIBUTE_TYPE_ENTITY, XML_ROLE_ATTRIBUTE_TYPE_ENTITIES, XML_ROLE_ATTRIBUTE_TYPE_NMTOKEN, XML_ROLE_ATTRIBUTE_TYPE_NMTOKENS, XML_ROLE_ATTRIBUTE_ENUM_VALUE, XML_ROLE_ATTRIBUTE_NOTATION_VALUE, XML_ROLE_ATTLIST_ELEMENT_NAME, XML_ROLE_IMPLIED_ATTRIBUTE_VALUE, XML_ROLE_REQUIRED_ATTRIBUTE_VALUE, XML_ROLE_DEFAULT_ATTRIBUTE_VALUE, XML_ROLE_FIXED_ATTRIBUTE_VALUE, XML_ROLE_ELEMENT_NAME, XML_ROLE_CONTENT_ANY, XML_ROLE_CONTENT_EMPTY, XML_ROLE_CONTENT_PCDATA, XML_ROLE_GROUP_OPEN, XML_ROLE_GROUP_CLOSE, XML_ROLE_GROUP_CLOSE_REP, XML_ROLE_GROUP_CLOSE_OPT, XML_ROLE_GROUP_CLOSE_PLUS, XML_ROLE_GROUP_CHOICE, XML_ROLE_GROUP_SEQUENCE, XML_ROLE_CONTENT_ELEMENT, XML_ROLE_CONTENT_ELEMENT_REP, XML_ROLE_CONTENT_ELEMENT_OPT, XML_ROLE_CONTENT_ELEMENT_PLUS, XML_ROLE_PARAM_ENTITY_REF }; typedef struct prolog_state { int (*handler)(struct prolog_state *state, int tok, const char *ptr, const char *end, const ENCODING *enc); unsigned level; } PROLOG_STATE; void XMLTOKAPI XmlPrologStateInit(PROLOG_STATE *); #define XmlTokenRole(state, tok, ptr, end, enc) \ (((state)->handler)(state, tok, ptr, end, enc)) #ifdef __cplusplus } #endif #endif /* not XmlRole_INCLUDED */ 1.1 apache-1.3/src/lib/expat/xmltok.c Index: xmltok.c =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #include "xmltok.h" #include "nametab.h" #define VTABLE1 \ { PREFIX(prologTok), PREFIX(contentTok), PREFIX(cdataSectionTok) }, \ { PREFIX(attributeValueTok), PREFIX(entityValueTok) }, \ PREFIX(sameName), \ PREFIX(nameMatchesAscii), \ PREFIX(nameLength), \ PREFIX(skipS), \ PREFIX(getAtts), \ PREFIX(charRefNumber), \ PREFIX(predefinedEntityName), \ PREFIX(updatePosition), \ PREFIX(isPublicId) #define VTABLE VTABLE1, PREFIX(toUtf8), PREFIX(toUtf16) #define UCS2_GET_NAMING(pages, hi, lo) \ (namingBitmap[(pages[hi] << 3) + ((lo) >> 5)] & (1 << ((lo) & 0x1F))) /* A 2 byte UTF-8 representation splits the characters 11 bits between the bottom 5 and 6 bits of the bytes. We need 8 bits to index into pages, 3 bits to add to that index and 5 bits to generate the mask. */ #define UTF8_GET_NAMING2(pages, byte) \ (namingBitmap[((pages)[(((byte)[0]) >> 2) & 7] << 3) \ + ((((byte)[0]) & 3) << 1) \ + ((((byte)[1]) >> 5) & 1)] \ & (1 << (((byte)[1]) & 0x1F))) /* A 3 byte UTF-8 representation splits the characters 16 bits between the bottom 4, 6 and 6 bits of the bytes. We need 8 bits to index into pages, 3 bits to add to that index and 5 bits to generate the mask. */ #define UTF8_GET_NAMING3(pages, byte) \ (namingBitmap[((pages)[((((byte)[0]) & 0xF) << 4) \ + ((((byte)[1]) >> 2) & 0xF)] \ << 3) \ + ((((byte)[1]) & 3) << 1) \ + ((((byte)[2]) >> 5) & 1)] \ & (1 << (((byte)[2]) & 0x1F))) #define UTF8_GET_NAMING(pages, p, n) \ ((n) == 2 \ ? UTF8_GET_NAMING2(pages, (const unsigned char *)(p)) \ : ((n) == 3 \ ? UTF8_GET_NAMING3(pages, (const unsigned char *)(p)) \ : 0)) #define UTF8_INVALID3(p) \ ((*p) == 0xED \ ? (((p)[1] & 0x20) != 0) \ : ((*p) == 0xEF \ ? ((p)[1] == 0xBF && ((p)[2] == 0xBF || (p)[2] == 0xBE)) \ : 0)) #define UTF8_INVALID4(p) ((*p) == 0xF4 && ((p)[1] & 0x30) != 0) static int isNever(const ENCODING *enc, const char *p) { return 0; } static int utf8_isName2(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING2(namePages, (const unsigned char *)p); } static int utf8_isName3(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING3(namePages, (const unsigned char *)p); } #define utf8_isName4 isNever static int utf8_isNmstrt2(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING2(nmstrtPages, (const unsigned char *)p); } static int utf8_isNmstrt3(const ENCODING *enc, const char *p) { return UTF8_GET_NAMING3(nmstrtPages, (const unsigned char *)p); } #define utf8_isNmstrt4 isNever #define utf8_isInvalid2 isNever static int utf8_isInvalid3(const ENCODING *enc, const char *p) { return UTF8_INVALID3((const unsigned char *)p); } static int utf8_isInvalid4(const ENCODING *enc, const char *p) { return UTF8_INVALID4((const unsigned char *)p); } struct normal_encoding { ENCODING enc; unsigned char type[256]; int (*isName2)(const ENCODING *, const char *); int (*isName3)(const ENCODING *, const char *); int (*isName4)(const ENCODING *, const char *); int (*isNmstrt2)(const ENCODING *, const char *); int (*isNmstrt3)(const ENCODING *, const char *); int (*isNmstrt4)(const ENCODING *, const char *); int (*isInvalid2)(const ENCODING *, const char *); int (*isInvalid3)(const ENCODING *, const char *); int (*isInvalid4)(const ENCODING *, const char *); }; #define NORMAL_VTABLE(E) \ E ## isName2, \ E ## isName3, \ E ## isName4, \ E ## isNmstrt2, \ E ## isNmstrt3, \ E ## isNmstrt4, \ E ## isInvalid2, \ E ## isInvalid3, \ E ## isInvalid4 static int checkCharRefNumber(int); #include "xmltok_impl.h" /* minimum bytes per character */ #define MINBPC 1 #define BYTE_TYPE(enc, p) \ (((struct normal_encoding *)(enc))->type[(unsigned char)*(p)]) #define BYTE_TO_ASCII(enc, p) (*p) #define IS_NAME_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isName ## n(enc, p)) #define IS_NMSTRT_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isNmstrt ## n(enc, p)) #define IS_INVALID_CHAR(enc, p, n) \ (((const struct normal_encoding *)(enc))->isInvalid ## n(enc, p)) #define IS_NAME_CHAR_MINBPC(enc, p) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) (0) /* c is an ASCII character */ #define CHAR_MATCHES(enc, p, c) (*(p) == c) #define PREFIX(ident) normal_ ## ident #include "xmltok_impl.c" #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR enum { /* UTF8_cvalN is value of masked first byte of N byte sequence */ UTF8_cval1 = 0x00, UTF8_cval2 = 0xc0, UTF8_cval3 = 0xe0, UTF8_cval4 = 0xf0 }; static void utf8_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { char *to; const char *from; if (fromLim - *fromP > toLim - *toP) { /* Avoid copying partial characters. */ for (fromLim = *fromP + (toLim - *toP); fromLim > *fromP; fromLim--) if (((unsigned char)fromLim[-1] & 0xc0) != 0x80) break; } for (to = *toP, from = *fromP; from != fromLim; from++, to++) *to = *from; *fromP = from; *toP = to; } static void utf8_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { unsigned short *to = *toP; const char *from = *fromP; while (from != fromLim && to != toLim) { switch (((struct normal_encoding *)enc)->type[(unsigned char)*from]) { case BT_LEAD2: *to++ = ((from[0] & 0x1f) << 6) | (from[1] & 0x3f); from += 2; break; case BT_LEAD3: *to++ = ((from[0] & 0xf) << 12) | ((from[1] & 0x3f) << 6) | (from[2] & 0x3f); from += 3; break; case BT_LEAD4: { unsigned long n; if (to + 1 == toLim) break; n = ((from[0] & 0x7) << 18) | ((from[1] & 0x3f) << 12) | ((from[2] & 0x3f) << 6) | (from[3] & 0x3f); n -= 0x10000; to[0] = (unsigned short)((n >> 10) | 0xD800); to[1] = (unsigned short)((n & 0x3FF) | 0xDC00); to += 2; from += 4; } break; default: *to++ = *from++; break; } } *fromP = from; *toP = to; } static const struct normal_encoding utf8_encoding = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #include "asciitab.h" #include "utf8tab.h" }, NORMAL_VTABLE(utf8_) }; static const struct normal_encoding internal_utf8_encoding = { { VTABLE1, utf8_toUtf8, utf8_toUtf16, 1, 1, 0 }, { #include "iasciitab.h" #include "utf8tab.h" }, NORMAL_VTABLE(utf8_) }; static void latin1_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { for (;;) { unsigned char c; if (*fromP == fromLim) break; c = (unsigned char)**fromP; if (c & 0x80) { if (toLim - *toP < 2) break; *(*toP)++ = ((c >> 6) | UTF8_cval2); *(*toP)++ = ((c & 0x3f) | 0x80); (*fromP)++; } else { if (*toP == toLim) break; *(*toP)++ = *(*fromP)++; } } } static void latin1_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { while (*fromP != fromLim && *toP != toLim) *(*toP)++ = (unsigned char)*(*fromP)++; } static const struct normal_encoding latin1_encoding = { { VTABLE1, latin1_toUtf8, latin1_toUtf16, 1, 0, 0 }, { #include "asciitab.h" #include "latin1tab.h" } }; static void ascii_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { while (*fromP != fromLim && *toP != toLim) *(*toP)++ = *(*fromP)++; } static const struct normal_encoding ascii_encoding = { { VTABLE1, ascii_toUtf8, latin1_toUtf16, 1, 1, 0 }, { #include "asciitab.h" /* BT_NONXML == 0 */ } }; #undef PREFIX static int unicode_byte_type(char hi, char lo) { switch ((unsigned char)hi) { case 0xD8: case 0xD9: case 0xDA: case 0xDB: return BT_LEAD4; case 0xDC: case 0xDD: case 0xDE: case 0xDF: return BT_TRAIL; case 0xFF: switch ((unsigned char)lo) { case 0xFF: case 0xFE: return BT_NONXML; } break; } return BT_NONASCII; } #define DEFINE_UTF16_TO_UTF8 \ static \ void PREFIX(toUtf8)(const ENCODING *enc, \ const char **fromP, const char *fromLim, \ char **toP, const char *toLim) \ { \ const char *from; \ for (from = *fromP; from != fromLim; from += 2) { \ int plane; \ unsigned char lo2; \ unsigned char lo = GET_LO(from); \ unsigned char hi = GET_HI(from); \ switch (hi) { \ case 0: \ if (lo < 0x80) { \ if (*toP == toLim) { \ *fromP = from; \ return; \ } \ *(*toP)++ = lo; \ break; \ } \ /* fall through */ \ case 0x1: case 0x2: case 0x3: \ case 0x4: case 0x5: case 0x6: case 0x7: \ if (toLim - *toP < 2) { \ *fromP = from; \ return; \ } \ *(*toP)++ = ((lo >> 6) | (hi << 2) | UTF8_cval2); \ *(*toP)++ = ((lo & 0x3f) | 0x80); \ break; \ default: \ if (toLim - *toP < 3) { \ *fromP = from; \ return; \ } \ /* 16 bits divided 4, 6, 6 amongst 3 bytes */ \ *(*toP)++ = ((hi >> 4) | UTF8_cval3); \ *(*toP)++ = (((hi & 0xf) << 2) | (lo >> 6) | 0x80); \ *(*toP)++ = ((lo & 0x3f) | 0x80); \ break; \ case 0xD8: case 0xD9: case 0xDA: case 0xDB: \ if (toLim - *toP < 4) { \ *fromP = from; \ return; \ } \ plane = (((hi & 0x3) << 2) | ((lo >> 6) & 0x3)) + 1; \ *(*toP)++ = ((plane >> 2) | UTF8_cval4); \ *(*toP)++ = (((lo >> 2) & 0xF) | ((plane & 0x3) << 4) | 0x80); \ from += 2; \ lo2 = GET_LO(from); \ *(*toP)++ = (((lo & 0x3) << 4) \ | ((GET_HI(from) & 0x3) << 2) \ | (lo2 >> 6) \ | 0x80); \ *(*toP)++ = ((lo2 & 0x3f) | 0x80); \ break; \ } \ } \ *fromP = from; \ } #define DEFINE_UTF16_TO_UTF16 \ static \ void PREFIX(toUtf16)(const ENCODING *enc, \ const char **fromP, const char *fromLim, \ unsigned short **toP, const unsigned short *toLim) \ { \ /* Avoid copying first half only of surrogate */ \ if (fromLim - *fromP > ((toLim - *toP) << 1) \ && (GET_HI(fromLim - 2) & 0xF8) == 0xD8) \ fromLim -= 2; \ for (; *fromP != fromLim && *toP != toLim; *fromP += 2) \ *(*toP)++ = (GET_HI(*fromP) << 8) | GET_LO(*fromP); \ } #define PREFIX(ident) little2_ ## ident #define MINBPC 2 #define BYTE_TYPE(enc, p) \ ((p)[1] == 0 \ ? ((struct normal_encoding *)(enc))->type[(unsigned char)*(p)] \ : unicode_byte_type((p)[1], (p)[0])) #define BYTE_TO_ASCII(enc, p) ((p)[1] == 0 ? (p)[0] : -1) #define CHAR_MATCHES(enc, p, c) ((p)[1] == 0 && (p)[0] == c) #define IS_NAME_CHAR(enc, p, n) (0) #define IS_NAME_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(namePages, (unsigned char)p[1], (unsigned char)p[0]) #define IS_NMSTRT_CHAR(enc, p, n) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[1], (unsigned char)p[0]) #include "xmltok_impl.c" #define SET2(ptr, ch) \ (((ptr)[0] = ((ch) & 0xff)), ((ptr)[1] = ((ch) >> 8))) #define GET_LO(ptr) ((unsigned char)(ptr)[0]) #define GET_HI(ptr) ((unsigned char)(ptr)[1]) DEFINE_UTF16_TO_UTF8 DEFINE_UTF16_TO_UTF16 #undef SET2 #undef GET_LO #undef GET_HI #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR static const struct normal_encoding little2_encoding = { { VTABLE, 2, 0, #if BYTE_ORDER == 12 1 #else 0 #endif }, #include "asciitab.h" #include "latin1tab.h" }; #if BYTE_ORDER != 21 static const struct normal_encoding internal_little2_encoding = { { VTABLE, 2, 0, 1 }, #include "iasciitab.h" #include "latin1tab.h" }; #endif #undef PREFIX #define PREFIX(ident) big2_ ## ident #define MINBPC 2 /* CHAR_MATCHES is guaranteed to have MINBPC bytes available. */ #define BYTE_TYPE(enc, p) \ ((p)[0] == 0 \ ? ((struct normal_encoding *)(enc))->type[(unsigned char)(p)[1]] \ : unicode_byte_type((p)[0], (p)[1])) #define BYTE_TO_ASCII(enc, p) ((p)[0] == 0 ? (p)[1] : -1) #define CHAR_MATCHES(enc, p, c) ((p)[0] == 0 && (p)[1] == c) #define IS_NAME_CHAR(enc, p, n) 0 #define IS_NAME_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(namePages, (unsigned char)p[0], (unsigned char)p[1]) #define IS_NMSTRT_CHAR(enc, p, n) (0) #define IS_NMSTRT_CHAR_MINBPC(enc, p) \ UCS2_GET_NAMING(nmstrtPages, (unsigned char)p[0], (unsigned char)p[1]) #include "xmltok_impl.c" #define SET2(ptr, ch) \ (((ptr)[0] = ((ch) >> 8)), ((ptr)[1] = ((ch) & 0xFF))) #define GET_LO(ptr) ((unsigned char)(ptr)[1]) #define GET_HI(ptr) ((unsigned char)(ptr)[0]) DEFINE_UTF16_TO_UTF8 DEFINE_UTF16_TO_UTF16 #undef SET2 #undef GET_LO #undef GET_HI #undef MINBPC #undef BYTE_TYPE #undef BYTE_TO_ASCII #undef CHAR_MATCHES #undef IS_NAME_CHAR #undef IS_NAME_CHAR_MINBPC #undef IS_NMSTRT_CHAR #undef IS_NMSTRT_CHAR_MINBPC #undef IS_INVALID_CHAR static const struct normal_encoding big2_encoding = { { VTABLE, 2, 0, #if BYTE_ORDER == 21 1 #else 0 #endif }, #include "asciitab.h" #include "latin1tab.h" }; #if BYTE_ORDER != 12 static const struct normal_encoding internal_big2_encoding = { { VTABLE, 2, 0, 1 }, #include "iasciitab.h" #include "latin1tab.h" }; #endif #undef PREFIX static int streqci(const char *s1, const char *s2) { for (;;) { char c1 = *s1++; char c2 = *s2++; if ('a' <= c1 && c1 <= 'z') c1 += 'A' - 'a'; if ('a' <= c2 && c2 <= 'z') c2 += 'A' - 'a'; if (c1 != c2) return 0; if (!c1) break; } return 1; } static int initScan(const ENCODING *enc, int state, const char *ptr, const char *end, const char **nextTokPtr) { const ENCODING **encPtr; if (ptr == end) return XML_TOK_NONE; encPtr = ((const INIT_ENCODING *)enc)->encPtr; if (ptr + 1 == end) { switch ((unsigned char)*ptr) { case 0xFE: case 0xFF: case 0x00: case 0x3C: return XML_TOK_PARTIAL; } } else { switch (((unsigned char)ptr[0] << 8) | (unsigned char)ptr[1]) { case 0x003C: *encPtr = &big2_encoding.enc; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); case 0xFEFF: *nextTokPtr = ptr + 2; *encPtr = &big2_encoding.enc; return XML_TOK_BOM; case 0x3C00: *encPtr = &little2_encoding.enc; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); case 0xFFFE: *nextTokPtr = ptr + 2; *encPtr = &little2_encoding.enc; return XML_TOK_BOM; } } *encPtr = &utf8_encoding.enc; return XmlTok(*encPtr, state, ptr, end, nextTokPtr); } static int initScanProlog(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { return initScan(enc, XML_PROLOG_STATE, ptr, end, nextTokPtr); } static int initScanContent(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { return initScan(enc, XML_CONTENT_STATE, ptr, end, nextTokPtr); } static void initUpdatePosition(const ENCODING *enc, const char *ptr, const char *end, POSITION *pos) { normal_updatePosition(&utf8_encoding.enc, ptr, end, pos); } const ENCODING *XmlGetUtf8InternalEncoding() { return &internal_utf8_encoding.enc; } const ENCODING *XmlGetUtf16InternalEncoding() { #if BYTE_ORDER == 12 return &internal_little2_encoding.enc; #elif BYTE_ORDER == 21 return &internal_big2_encoding.enc; #else const short n = 1; return *(const char *)&n ? &internal_little2_encoding.enc : &internal_big2_encoding.enc; #endif } int XmlInitEncoding(INIT_ENCODING *p, const ENCODING **encPtr, const char *name) { if (name) { if (streqci(name, "ISO-8859-1")) { *encPtr = &latin1_encoding.enc; return 1; } if (streqci(name, "UTF-8")) { *encPtr = &utf8_encoding.enc; return 1; } if (streqci(name, "US-ASCII")) { *encPtr = &ascii_encoding.enc; return 1; } if (!streqci(name, "UTF-16")) return 0; } p->initEnc.scanners[XML_PROLOG_STATE] = initScanProlog; p->initEnc.scanners[XML_CONTENT_STATE] = initScanContent; p->initEnc.updatePosition = initUpdatePosition; p->initEnc.minBytesPerChar = 1; p->encPtr = encPtr; *encPtr = &(p->initEnc); return 1; } static int toAscii(const ENCODING *enc, const char *ptr, const char *end) { char buf[1]; char *p = buf; XmlUtf8Convert(enc, &ptr, end, &p, p + 1); if (p == buf) return -1; else return buf[0]; } static int isSpace(int c) { switch (c) { case ' ': case '\r': case '\n': case '\t': return 1; } return 0; } /* Return 1 if there's just optional white space or there's an S followed by name=val. */ static int parsePseudoAttribute(const ENCODING *enc, const char *ptr, const char *end, const char **namePtr, const char **valPtr, const char **nextTokPtr) { int c; char open; if (ptr == end) { *namePtr = 0; return 1; } if (!isSpace(toAscii(enc, ptr, end))) { *nextTokPtr = ptr; return 0; } do { ptr += enc->minBytesPerChar; } while (isSpace(toAscii(enc, ptr, end))); if (ptr == end) { *namePtr = 0; return 1; } *namePtr = ptr; for (;;) { c = toAscii(enc, ptr, end); if (c == -1) { *nextTokPtr = ptr; return 0; } if (c == '=') break; if (isSpace(c)) { do { ptr += enc->minBytesPerChar; } while (isSpace(c = toAscii(enc, ptr, end))); if (c != '=') { *nextTokPtr = ptr; return 0; } break; } ptr += enc->minBytesPerChar; } if (ptr == *namePtr) { *nextTokPtr = ptr; return 0; } ptr += enc->minBytesPerChar; c = toAscii(enc, ptr, end); while (isSpace(c)) { ptr += enc->minBytesPerChar; c = toAscii(enc, ptr, end); } if (c != '"' && c != '\'') { *nextTokPtr = ptr; return 0; } open = c; ptr += enc->minBytesPerChar; *valPtr = ptr; for (;; ptr += enc->minBytesPerChar) { c = toAscii(enc, ptr, end); if (c == open) break; if (!('a' <= c && c <= 'z') && !('A' <= c && c <= 'Z') && !('0' <= c && c <= '9') && c != '.' && c != '-' && c != '_') { *nextTokPtr = ptr; return 0; } } *nextTokPtr = ptr + enc->minBytesPerChar; return 1; } static const ENCODING *findEncoding(const ENCODING *enc, const char *ptr, const char *end) { #define ENCODING_MAX 128 char buf[ENCODING_MAX]; char *p = buf; int i; XmlUtf8Convert(enc, &ptr, end, &p, p + ENCODING_MAX - 1); if (ptr != end) return 0; *p = 0; for (i = 0; buf[i]; i++) { if ('a' <= buf[i] && buf[i] <= 'z') buf[i] += 'A' - 'a'; } if (streqci(buf, "UTF-8")) return &utf8_encoding.enc; if (streqci(buf, "ISO-8859-1")) return &latin1_encoding.enc; if (streqci(buf, "US-ASCII")) return &ascii_encoding.enc; if (streqci(buf, "UTF-16")) { static const unsigned short n = 1; if (enc->minBytesPerChar == 2) return enc; return &big2_encoding.enc; } return 0; } int XmlParseXmlDecl(int isGeneralTextEntity, const ENCODING *enc, const char *ptr, const char *end, const char **badPtr, const char **versionPtr, const char **encodingName, const ENCODING **encoding, int *standalone) { const char *val = 0; const char *name = 0; ptr += 5 * enc->minBytesPerChar; end -= 2 * enc->minBytesPerChar; if (!parsePseudoAttribute(enc, ptr, end, &name, &val, &ptr) || !name) { *badPtr = ptr; return 0; } if (!XmlNameMatchesAscii(enc, name, "version")) { if (!isGeneralTextEntity) { *badPtr = name; return 0; } } else { if (versionPtr) *versionPtr = val; if (!parsePseudoAttribute(enc, ptr, end, &name, &val, &ptr)) { *badPtr = ptr; return 0; } if (!name) { if (isGeneralTextEntity) { /* a TextDecl must have an EncodingDecl */ *badPtr = ptr; return 0; } return 1; } } if (XmlNameMatchesAscii(enc, name, "encoding")) { int c = toAscii(enc, val, end); if (!('a' <= c && c <= 'z') && !('A' <= c && c <= 'Z')) { *badPtr = val; return 0; } if (encodingName) *encodingName = val; if (encoding) *encoding = findEncoding(enc, val, ptr - enc->minBytesPerChar); if (!parsePseudoAttribute(enc, ptr, end, &name, &val, &ptr)) { *badPtr = ptr; return 0; } if (!name) return 1; } if (!XmlNameMatchesAscii(enc, name, "standalone") || isGeneralTextEntity) { *badPtr = name; return 0; } if (XmlNameMatchesAscii(enc, val, "yes")) { if (standalone) *standalone = 1; } else if (XmlNameMatchesAscii(enc, val, "no")) { if (standalone) *standalone = 0; } else { *badPtr = val; return 0; } while (isSpace(toAscii(enc, ptr, end))) ptr += enc->minBytesPerChar; if (ptr != end) { *badPtr = ptr; return 0; } return 1; } static int checkCharRefNumber(int result) { switch (result >> 8) { case 0xD8: case 0xD9: case 0xDA: case 0xDB: case 0xDC: case 0xDD: case 0xDE: case 0xDF: return -1; case 0: if (latin1_encoding.type[result] == BT_NONXML) return -1; break; case 0xFF: if (result == 0xFFFE || result == 0xFFFF) return -1; break; } return result; } int XmlUtf8Encode(int c, char *buf) { enum { /* minN is minimum legal resulting value for N byte sequence */ min2 = 0x80, min3 = 0x800, min4 = 0x10000 }; if (c < 0) return 0; if (c < min2) { buf[0] = (c | UTF8_cval1); return 1; } if (c < min3) { buf[0] = ((c >> 6) | UTF8_cval2); buf[1] = ((c & 0x3f) | 0x80); return 2; } if (c < min4) { buf[0] = ((c >> 12) | UTF8_cval3); buf[1] = (((c >> 6) & 0x3f) | 0x80); buf[2] = ((c & 0x3f) | 0x80); return 3; } if (c < 0x110000) { buf[0] = ((c >> 18) | UTF8_cval4); buf[1] = (((c >> 12) & 0x3f) | 0x80); buf[2] = (((c >> 6) & 0x3f) | 0x80); buf[3] = ((c & 0x3f) | 0x80); return 4; } return 0; } int XmlUtf16Encode(int charNum, unsigned short *buf) { if (charNum < 0) return 0; if (charNum < 0x10000) { buf[0] = charNum; return 1; } if (charNum < 0x110000) { charNum -= 0x10000; buf[0] = (charNum >> 10) + 0xD800; buf[1] = (charNum & 0x3FF) + 0xDC00; return 2; } return 0; } struct unknown_encoding { struct normal_encoding normal; int (*convert)(void *userData, const char *p); void *userData; unsigned short utf16[256]; char utf8[256][4]; }; int XmlSizeOfUnknownEncoding() { return sizeof(struct unknown_encoding); } static int unknown_isName(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); if (c & ~0xFFFF) return 0; return UCS2_GET_NAMING(namePages, c >> 8, c & 0xFF); } static int unknown_isNmstrt(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); if (c & ~0xFFFF) return 0; return UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xFF); } static int unknown_isInvalid(const ENCODING *enc, const char *p) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, p); return (c & ~0xFFFF) || checkCharRefNumber(c) < 0; } static void unknown_toUtf8(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim) { char buf[XML_UTF8_ENCODE_MAX]; for (;;) { const char *utf8; int n; if (*fromP == fromLim) break; utf8 = ((const struct unknown_encoding *)enc)->utf8[(unsigned char)**fromP]; n = *utf8++; if (n == 0) { int c = ((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, *fromP); n = XmlUtf8Encode(c, buf); if (n > toLim - *toP) break; utf8 = buf; *fromP += ((const struct normal_encoding *)enc)->type[(unsigned char)**fromP] - (BT_LEAD2 - 2); } else { if (n > toLim - *toP) break; (*fromP)++; } do { *(*toP)++ = *utf8++; } while (--n != 0); } } static void unknown_toUtf16(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim) { while (*fromP != fromLim && *toP != toLim) { unsigned short c = ((const struct unknown_encoding *)enc)->utf16[(unsigned char)**fromP]; if (c == 0) { c = (unsigned short)((const struct unknown_encoding *)enc) ->convert(((const struct unknown_encoding *)enc)->userData, *fromP); *fromP += ((const struct normal_encoding *)enc)->type[(unsigned char)**fromP] - (BT_LEAD2 - 2); } else (*fromP)++; *(*toP)++ = c; } } ENCODING * XmlInitUnknownEncoding(void *mem, int *table, int (*convert)(void *userData, const char *p), void *userData) { int i; struct unknown_encoding *e = mem; for (i = 0; i < sizeof(struct normal_encoding); i++) ((char *)mem)[i] = ((char *)&latin1_encoding)[i]; for (i = 0; i < 128; i++) if (latin1_encoding.type[i] != BT_OTHER && latin1_encoding.type[i] != BT_NONXML && table[i] != i) return 0; for (i = 0; i < 256; i++) { int c = table[i]; if (c == -1) { e->normal.type[i] = BT_MALFORM; /* This shouldn't really get used. */ e->utf16[i] = 0xFFFF; e->utf8[i][0] = 1; e->utf8[i][1] = 0; } else if (c < 0) { if (c < -4) return 0; e->normal.type[i] = BT_LEAD2 - (c + 2); e->utf8[i][0] = 0; e->utf16[i] = 0; } else if (c < 0x80) { if (latin1_encoding.type[c] != BT_OTHER && latin1_encoding.type[c] != BT_NONXML && c != i) return 0; e->normal.type[i] = latin1_encoding.type[c]; e->utf8[i][0] = 1; e->utf8[i][1] = (char)c; e->utf16[i] = c == 0 ? 0xFFFF : c; } else if (checkCharRefNumber(c) < 0) { e->normal.type[i] = BT_NONXML; /* This shouldn't really get used. */ e->utf16[i] = 0xFFFF; e->utf8[i][0] = 1; e->utf8[i][1] = 0; } else { if (c > 0xFFFF) return 0; if (UCS2_GET_NAMING(nmstrtPages, c >> 8, c & 0xff)) e->normal.type[i] = BT_NMSTRT; else if (UCS2_GET_NAMING(namePages, c >> 8, c & 0xff)) e->normal.type[i] = BT_NAME; else e->normal.type[i] = BT_OTHER; e->utf8[i][0] = (char)XmlUtf8Encode(c, e->utf8[i] + 1); e->utf16[i] = c; } } e->userData = userData; e->convert = convert; if (convert) { e->normal.isName2 = unknown_isName; e->normal.isName3 = unknown_isName; e->normal.isName4 = unknown_isName; e->normal.isNmstrt2 = unknown_isNmstrt; e->normal.isNmstrt3 = unknown_isNmstrt; e->normal.isNmstrt4 = unknown_isNmstrt; e->normal.isInvalid2 = unknown_isInvalid; e->normal.isInvalid3 = unknown_isInvalid; e->normal.isInvalid4 = unknown_isInvalid; } e->normal.enc.utf8Convert = unknown_toUtf8; e->normal.enc.utf16Convert = unknown_toUtf16; return &(e->normal.enc); } 1.1 apache-1.3/src/lib/expat/xmltok.h Index: xmltok.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #ifndef XmlTok_INCLUDED #define XmlTok_INCLUDED 1 #ifdef __cplusplus extern "C" { #endif #ifndef XMLTOKAPI #define XMLTOKAPI /* as nothing */ #endif /* The following token may be returned by XmlContentTok */ #define XML_TOK_TRAILING_RSQB -5 /* ] or ]] at the end of the scan; might be start of illegal ]]> sequence */ /* The following tokens may be returned by both XmlPrologTok and XmlContentTok */ #define XML_TOK_NONE -4 /* The string to be scanned is empty */ #define XML_TOK_TRAILING_CR -3 /* A CR at the end of the scan; might be part of CRLF sequence */ #define XML_TOK_PARTIAL_CHAR -2 /* only part of a multibyte sequence */ #define XML_TOK_PARTIAL -1 /* only part of a token */ #define XML_TOK_INVALID 0 /* The following tokens are returned by XmlContentTok; some are also returned by XmlAttributeValueTok, XmlEntityTok, XmlCdataSectionTok */ #define XML_TOK_START_TAG_WITH_ATTS 1 #define XML_TOK_START_TAG_NO_ATTS 2 #define XML_TOK_EMPTY_ELEMENT_WITH_ATTS 3 /* empty element tag <e/> */ #define XML_TOK_EMPTY_ELEMENT_NO_ATTS 4 #define XML_TOK_END_TAG 5 #define XML_TOK_DATA_CHARS 6 #define XML_TOK_DATA_NEWLINE 7 #define XML_TOK_CDATA_SECT_OPEN 8 #define XML_TOK_ENTITY_REF 9 #define XML_TOK_CHAR_REF 10 /* numeric character reference */ /* The following tokens may be returned by both XmlPrologTok and XmlContentTok */ #define XML_TOK_PI 11 /* processing instruction */ #define XML_TOK_XML_DECL 12 /* XML decl or text decl */ #define XML_TOK_COMMENT 13 #define XML_TOK_BOM 14 /* Byte order mark */ /* The following tokens are returned only by XmlPrologTok */ #define XML_TOK_PROLOG_S 15 #define XML_TOK_DECL_OPEN 16 /* <!foo */ #define XML_TOK_DECL_CLOSE 17 /* > */ #define XML_TOK_NAME 18 #define XML_TOK_NMTOKEN 19 #define XML_TOK_POUND_NAME 20 /* #name */ #define XML_TOK_OR 21 /* | */ #define XML_TOK_PERCENT 22 #define XML_TOK_OPEN_PAREN 23 #define XML_TOK_CLOSE_PAREN 24 #define XML_TOK_OPEN_BRACKET 25 #define XML_TOK_CLOSE_BRACKET 26 #define XML_TOK_LITERAL 27 #define XML_TOK_PARAM_ENTITY_REF 28 #define XML_TOK_INSTANCE_START 29 /* The following occur only in element type declarations */ #define XML_TOK_NAME_QUESTION 30 /* name? */ #define XML_TOK_NAME_ASTERISK 31 /* name* */ #define XML_TOK_NAME_PLUS 32 /* name+ */ #define XML_TOK_COND_SECT_OPEN 33 /* <![ */ #define XML_TOK_COND_SECT_CLOSE 34 /* ]]> */ #define XML_TOK_CLOSE_PAREN_QUESTION 35 /* )? */ #define XML_TOK_CLOSE_PAREN_ASTERISK 36 /* )* */ #define XML_TOK_CLOSE_PAREN_PLUS 37 /* )+ */ #define XML_TOK_COMMA 38 /* The following token is returned only by XmlAttributeValueTok */ #define XML_TOK_ATTRIBUTE_VALUE_S 39 /* The following token is returned only by XmlCdataSectionTok */ #define XML_TOK_CDATA_SECT_CLOSE 40 #define XML_N_STATES 3 #define XML_PROLOG_STATE 0 #define XML_CONTENT_STATE 1 #define XML_CDATA_SECTION_STATE 2 #define XML_N_LITERAL_TYPES 2 #define XML_ATTRIBUTE_VALUE_LITERAL 0 #define XML_ENTITY_VALUE_LITERAL 1 /* The size of the buffer passed to XmlUtf8Encode must be at least this. */ #define XML_UTF8_ENCODE_MAX 4 /* The size of the buffer passed to XmlUtf16Encode must be at least this. */ #define XML_UTF16_ENCODE_MAX 2 typedef struct position { /* first line and first column are 0 not 1 */ unsigned long lineNumber; unsigned long columnNumber; } POSITION; typedef struct { const char *name; const char *valuePtr; const char *valueEnd; char normalized; } ATTRIBUTE; struct encoding; typedef struct encoding ENCODING; struct encoding { int (*scanners[XML_N_STATES])(const ENCODING *, const char *, const char *, const char **); int (*literalScanners[XML_N_LITERAL_TYPES])(const ENCODING *, const char *, const char *, const char **); int (*sameName)(const ENCODING *, const char *, const char *); int (*nameMatchesAscii)(const ENCODING *, const char *, const char *); int (*nameLength)(const ENCODING *, const char *); const char *(*skipS)(const ENCODING *, const char *); int (*getAtts)(const ENCODING *enc, const char *ptr, int attsMax, ATTRIBUTE *atts); int (*charRefNumber)(const ENCODING *enc, const char *ptr); int (*predefinedEntityName)(const ENCODING *, const char *, const char *); void (*updatePosition)(const ENCODING *, const char *ptr, const char *end, POSITION *); int (*isPublicId)(const ENCODING *enc, const char *ptr, const char *end, const char **badPtr); void (*utf8Convert)(const ENCODING *enc, const char **fromP, const char *fromLim, char **toP, const char *toLim); void (*utf16Convert)(const ENCODING *enc, const char **fromP, const char *fromLim, unsigned short **toP, const unsigned short *toLim); int minBytesPerChar; char isUtf8; char isUtf16; }; /* Scan the string starting at ptr until the end of the next complete token, but do not scan past eptr. Return an integer giving the type of token. Return XML_TOK_NONE when ptr == eptr; nextTokPtr will not be set. Return XML_TOK_PARTIAL when the string does not contain a complete token; nextTokPtr will not be set. Return XML_TOK_INVALID when the string does not start a valid token; nextTokPtr will be set to point to the character which made the token invalid. Otherwise the string starts with a valid token; nextTokPtr will be set to point to the character following the end of that token. Each data character counts as a single token, but adjacent data characters may be returned together. Similarly for characters in the prolog outside literals, comments and processing instructions. */ #define XmlTok(enc, state, ptr, end, nextTokPtr) \ (((enc)->scanners[state])(enc, ptr, end, nextTokPtr)) #define XmlPrologTok(enc, ptr, end, nextTokPtr) \ XmlTok(enc, XML_PROLOG_STATE, ptr, end, nextTokPtr) #define XmlContentTok(enc, ptr, end, nextTokPtr) \ XmlTok(enc, XML_CONTENT_STATE, ptr, end, nextTokPtr) #define XmlCdataSectionTok(enc, ptr, end, nextTokPtr) \ XmlTok(enc, XML_CDATA_SECTION_STATE, ptr, end, nextTokPtr) /* This is used for performing a 2nd-level tokenization on the content of a literal that has already been returned by XmlTok. */ #define XmlLiteralTok(enc, literalType, ptr, end, nextTokPtr) \ (((enc)->literalScanners[literalType])(enc, ptr, end, nextTokPtr)) #define XmlAttributeValueTok(enc, ptr, end, nextTokPtr) \ XmlLiteralTok(enc, XML_ATTRIBUTE_VALUE_LITERAL, ptr, end, nextTokPtr) #define XmlEntityValueTok(enc, ptr, end, nextTokPtr) \ XmlLiteralTok(enc, XML_ENTITY_VALUE_LITERAL, ptr, end, nextTokPtr) #define XmlSameName(enc, ptr1, ptr2) (((enc)->sameName)(enc, ptr1, ptr2)) #define XmlNameMatchesAscii(enc, ptr1, ptr2) \ (((enc)->nameMatchesAscii)(enc, ptr1, ptr2)) #define XmlNameLength(enc, ptr) \ (((enc)->nameLength)(enc, ptr)) #define XmlSkipS(enc, ptr) \ (((enc)->skipS)(enc, ptr)) #define XmlGetAttributes(enc, ptr, attsMax, atts) \ (((enc)->getAtts)(enc, ptr, attsMax, atts)) #define XmlCharRefNumber(enc, ptr) \ (((enc)->charRefNumber)(enc, ptr)) #define XmlPredefinedEntityName(enc, ptr, end) \ (((enc)->predefinedEntityName)(enc, ptr, end)) #define XmlUpdatePosition(enc, ptr, end, pos) \ (((enc)->updatePosition)(enc, ptr, end, pos)) #define XmlIsPublicId(enc, ptr, end, badPtr) \ (((enc)->isPublicId)(enc, ptr, end, badPtr)) #define XmlUtf8Convert(enc, fromP, fromLim, toP, toLim) \ (((enc)->utf8Convert)(enc, fromP, fromLim, toP, toLim)) #define XmlUtf16Convert(enc, fromP, fromLim, toP, toLim) \ (((enc)->utf16Convert)(enc, fromP, fromLim, toP, toLim)) typedef struct { ENCODING initEnc; const ENCODING **encPtr; } INIT_ENCODING; int XMLTOKAPI XmlParseXmlDecl(int isGeneralTextEntity, const ENCODING *enc, const char *ptr, const char *end, const char **badPtr, const char **versionPtr, const char **encodingNamePtr, const ENCODING **namedEncodingPtr, int *standalonePtr); int XMLTOKAPI XmlInitEncoding(INIT_ENCODING *, const ENCODING **, const char *name); const ENCODING XMLTOKAPI *XmlGetUtf8InternalEncoding(); const ENCODING XMLTOKAPI *XmlGetUtf16InternalEncoding(); int XMLTOKAPI XmlUtf8Encode(int charNumber, char *buf); int XMLTOKAPI XmlUtf16Encode(int charNumber, unsigned short *buf); int XMLTOKAPI XmlSizeOfUnknownEncoding(); ENCODING XMLTOKAPI * XmlInitUnknownEncoding(void *mem, int *table, int (*convert)(void *userData, const char *p), void *userData); #ifdef __cplusplus } #endif #endif /* not XmlTok_INCLUDED */ 1.1 apache-1.3/src/lib/expat/xmltok_impl.c Index: xmltok_impl.c =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ #ifndef IS_INVALID_CHAR #define IS_INVALID_CHAR(enc, ptr, n) (0) #endif #define INVALID_LEAD_CASE(n, ptr, nextTokPtr) \ case BT_LEAD ## n: \ if (end - ptr < n) \ return XML_TOK_PARTIAL_CHAR; \ if (IS_INVALID_CHAR(enc, ptr, n)) { \ *(nextTokPtr) = (ptr); \ return XML_TOK_INVALID; \ } \ ptr += n; \ break; #define INVALID_CASES(ptr, nextTokPtr) \ INVALID_LEAD_CASE(2, ptr, nextTokPtr) \ INVALID_LEAD_CASE(3, ptr, nextTokPtr) \ INVALID_LEAD_CASE(4, ptr, nextTokPtr) \ case BT_NONXML: \ case BT_MALFORM: \ case BT_TRAIL: \ *(nextTokPtr) = (ptr); \ return XML_TOK_INVALID; #define CHECK_NAME_CASE(n, enc, ptr, end, nextTokPtr) \ case BT_LEAD ## n: \ if (end - ptr < n) \ return XML_TOK_PARTIAL_CHAR; \ if (!IS_NAME_CHAR(enc, ptr, n)) { \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; \ } \ ptr += n; \ break; #define CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) \ case BT_NONASCII: \ if (!IS_NAME_CHAR_MINBPC(enc, ptr)) { \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; \ } \ case BT_NMSTRT: \ case BT_HEX: \ case BT_DIGIT: \ case BT_NAME: \ case BT_MINUS: \ ptr += MINBPC; \ break; \ CHECK_NAME_CASE(2, enc, ptr, end, nextTokPtr) \ CHECK_NAME_CASE(3, enc, ptr, end, nextTokPtr) \ CHECK_NAME_CASE(4, enc, ptr, end, nextTokPtr) #define CHECK_NMSTRT_CASE(n, enc, ptr, end, nextTokPtr) \ case BT_LEAD ## n: \ if (end - ptr < n) \ return XML_TOK_PARTIAL_CHAR; \ if (!IS_NMSTRT_CHAR(enc, ptr, n)) { \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; \ } \ ptr += n; \ break; #define CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) \ case BT_NONASCII: \ if (!IS_NMSTRT_CHAR_MINBPC(enc, ptr)) { \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; \ } \ case BT_NMSTRT: \ case BT_HEX: \ ptr += MINBPC; \ break; \ CHECK_NMSTRT_CASE(2, enc, ptr, end, nextTokPtr) \ CHECK_NMSTRT_CASE(3, enc, ptr, end, nextTokPtr) \ CHECK_NMSTRT_CASE(4, enc, ptr, end, nextTokPtr) #ifndef PREFIX #define PREFIX(ident) ident #endif /* ptr points to character following "<!-" */ static int PREFIX(scanComment)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr != end) { if (!CHAR_MATCHES(enc, ptr, '-')) { *nextTokPtr = ptr; return XML_TOK_INVALID; } ptr += MINBPC; while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { INVALID_CASES(ptr, nextTokPtr) case BT_MINUS: if ((ptr += MINBPC) == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr, '-')) { if ((ptr += MINBPC) == end) return XML_TOK_PARTIAL; if (!CHAR_MATCHES(enc, ptr, '>')) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC; return XML_TOK_COMMENT; } break; default: ptr += MINBPC; break; } } } return XML_TOK_PARTIAL; } /* ptr points to character following "<!" */ static int PREFIX(scanDecl)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { case BT_MINUS: return PREFIX(scanComment)(enc, ptr + MINBPC, end, nextTokPtr); case BT_LSQB: *nextTokPtr = ptr + MINBPC; return XML_TOK_COND_SECT_OPEN; case BT_NMSTRT: case BT_HEX: ptr += MINBPC; break; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { case BT_PERCNT: if (ptr + MINBPC == end) return XML_TOK_PARTIAL; /* don't allow <!ENTITY% foo "whatever"> */ switch (BYTE_TYPE(enc, ptr + MINBPC)) { case BT_S: case BT_CR: case BT_LF: case BT_PERCNT: *nextTokPtr = ptr; return XML_TOK_INVALID; } /* fall through */ case BT_S: case BT_CR: case BT_LF: *nextTokPtr = ptr; return XML_TOK_DECL_OPEN; case BT_NMSTRT: case BT_HEX: ptr += MINBPC; break; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } static int PREFIX(checkPiTarget)(const ENCODING *enc, const char *ptr, const char *end, int *tokPtr) { int upper = 0; *tokPtr = XML_TOK_PI; if (end - ptr != MINBPC*3) return 1; switch (BYTE_TO_ASCII(enc, ptr)) { case 'x': break; case 'X': upper = 1; break; default: return 1; } ptr += MINBPC; switch (BYTE_TO_ASCII(enc, ptr)) { case 'm': break; case 'M': upper = 1; break; default: return 1; } ptr += MINBPC; switch (BYTE_TO_ASCII(enc, ptr)) { case 'l': break; case 'L': upper = 1; break; default: return 1; } if (upper) return 0; *tokPtr = XML_TOK_XML_DECL; return 1; } /* ptr points to character following "<?" */ static int PREFIX(scanPi)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { int tok; const char *target = ptr; if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_S: case BT_CR: case BT_LF: if (!PREFIX(checkPiTarget)(enc, target, ptr, &tok)) { *nextTokPtr = ptr; return XML_TOK_INVALID; } ptr += MINBPC; while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { INVALID_CASES(ptr, nextTokPtr) case BT_QUEST: ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr, '>')) { *nextTokPtr = ptr + MINBPC; return tok; } break; default: ptr += MINBPC; break; } } return XML_TOK_PARTIAL; case BT_QUEST: if (!PREFIX(checkPiTarget)(enc, target, ptr, &tok)) { *nextTokPtr = ptr; return XML_TOK_INVALID; } ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr, '>')) { *nextTokPtr = ptr + MINBPC; return tok; } /* fall through */ default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } static int PREFIX(scanCdataSection)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { int i; /* CDATA[ */ if (end - ptr < 6 * MINBPC) return XML_TOK_PARTIAL; for (i = 0; i < 6; i++, ptr += MINBPC) { if (!CHAR_MATCHES(enc, ptr, "CDATA["[i])) { *nextTokPtr = ptr; return XML_TOK_INVALID; } } *nextTokPtr = ptr; return XML_TOK_CDATA_SECT_OPEN; } static int PREFIX(cdataSectionTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_NONE; #if MINBPC > 1 { size_t n = end - ptr; if (n & (MINBPC - 1)) { n &= ~(MINBPC - 1); if (n == 0) return XML_TOK_PARTIAL; end = ptr + n; } } #endif switch (BYTE_TYPE(enc, ptr)) { case BT_RSQB: ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; if (!CHAR_MATCHES(enc, ptr, ']')) break; ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; if (!CHAR_MATCHES(enc, ptr, '>')) { ptr -= MINBPC; break; } *nextTokPtr = ptr + MINBPC; return XML_TOK_CDATA_SECT_CLOSE; case BT_CR: ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; if (BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC; *nextTokPtr = ptr; return XML_TOK_DATA_NEWLINE; case BT_LF: *nextTokPtr = ptr + MINBPC; return XML_TOK_DATA_NEWLINE; INVALID_CASES(ptr, nextTokPtr) default: ptr += MINBPC; break; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ if (end - ptr < n || IS_INVALID_CHAR(enc, ptr, n)) { \ *nextTokPtr = ptr; \ return XML_TOK_DATA_CHARS; \ } \ ptr += n; \ break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_NONXML: case BT_MALFORM: case BT_TRAIL: case BT_CR: case BT_LF: case BT_RSQB: *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; default: ptr += MINBPC; break; } } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; } /* ptr points to character following "</" */ static int PREFIX(scanEndTag)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_S: case BT_CR: case BT_LF: for (ptr += MINBPC; ptr != end; ptr += MINBPC) { switch (BYTE_TYPE(enc, ptr)) { case BT_S: case BT_CR: case BT_LF: break; case BT_GT: *nextTokPtr = ptr + MINBPC; return XML_TOK_END_TAG; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; case BT_GT: *nextTokPtr = ptr + MINBPC; return XML_TOK_END_TAG; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } /* ptr points to character following "&#X" */ static int PREFIX(scanHexCharRef)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { case BT_DIGIT: case BT_HEX: break; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } for (ptr += MINBPC; ptr != end; ptr += MINBPC) { switch (BYTE_TYPE(enc, ptr)) { case BT_DIGIT: case BT_HEX: break; case BT_SEMI: *nextTokPtr = ptr + MINBPC; return XML_TOK_CHAR_REF; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } } return XML_TOK_PARTIAL; } /* ptr points to character following "&#" */ static int PREFIX(scanCharRef)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr != end) { if (CHAR_MATCHES(enc, ptr, 'x')) return PREFIX(scanHexCharRef)(enc, ptr + MINBPC, end, nextTokPtr); switch (BYTE_TYPE(enc, ptr)) { case BT_DIGIT: break; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } for (ptr += MINBPC; ptr != end; ptr += MINBPC) { switch (BYTE_TYPE(enc, ptr)) { case BT_DIGIT: break; case BT_SEMI: *nextTokPtr = ptr + MINBPC; return XML_TOK_CHAR_REF; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } } return XML_TOK_PARTIAL; } /* ptr points to character following "&" */ static int PREFIX(scanRef)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) case BT_NUM: return PREFIX(scanCharRef)(enc, ptr + MINBPC, end, nextTokPtr); default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_SEMI: *nextTokPtr = ptr + MINBPC; return XML_TOK_ENTITY_REF; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } /* ptr points to character following first character of attribute name */ static int PREFIX(scanAtts)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_S: case BT_CR: case BT_LF: for (;;) { int t; ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; t = BYTE_TYPE(enc, ptr); if (t == BT_EQUALS) break; switch (t) { case BT_S: case BT_LF: case BT_CR: break; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } /* fall through */ case BT_EQUALS: { int open; for (;;) { ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; open = BYTE_TYPE(enc, ptr); if (open == BT_QUOT || open == BT_APOS) break; switch (open) { case BT_S: case BT_LF: case BT_CR: break; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } ptr += MINBPC; /* in attribute value */ for (;;) { int t; if (ptr == end) return XML_TOK_PARTIAL; t = BYTE_TYPE(enc, ptr); if (t == open) break; switch (t) { INVALID_CASES(ptr, nextTokPtr) case BT_AMP: { int tok = PREFIX(scanRef)(enc, ptr + MINBPC, end, &ptr); if (tok <= 0) { if (tok == XML_TOK_INVALID) *nextTokPtr = ptr; return tok; } break; } case BT_LT: *nextTokPtr = ptr; return XML_TOK_INVALID; default: ptr += MINBPC; break; } } ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { case BT_S: case BT_CR: case BT_LF: break; case BT_SOL: goto sol; case BT_GT: goto gt; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } /* ptr points to closing quote */ for (;;) { ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) case BT_S: case BT_CR: case BT_LF: continue; case BT_GT: gt: *nextTokPtr = ptr + MINBPC; return XML_TOK_START_TAG_WITH_ATTS; case BT_SOL: sol: ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; if (!CHAR_MATCHES(enc, ptr, '>')) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC; return XML_TOK_EMPTY_ELEMENT_WITH_ATTS; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } break; } break; } default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } /* ptr points to character following "<" */ static int PREFIX(scanLt)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) case BT_EXCL: if ((ptr += MINBPC) == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { case BT_MINUS: return PREFIX(scanComment)(enc, ptr + MINBPC, end, nextTokPtr); case BT_LSQB: return PREFIX(scanCdataSection)(enc, ptr + MINBPC, end, nextTokPtr); } *nextTokPtr = ptr; return XML_TOK_INVALID; case BT_QUEST: return PREFIX(scanPi)(enc, ptr + MINBPC, end, nextTokPtr); case BT_SOL: return PREFIX(scanEndTag)(enc, ptr + MINBPC, end, nextTokPtr); default: *nextTokPtr = ptr; return XML_TOK_INVALID; } /* we have a start-tag */ while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_S: case BT_CR: case BT_LF: { ptr += MINBPC; while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) case BT_GT: goto gt; case BT_SOL: goto sol; case BT_S: case BT_CR: case BT_LF: ptr += MINBPC; continue; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } return PREFIX(scanAtts)(enc, ptr, end, nextTokPtr); } return XML_TOK_PARTIAL; } case BT_GT: gt: *nextTokPtr = ptr + MINBPC; return XML_TOK_START_TAG_NO_ATTS; case BT_SOL: sol: ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; if (!CHAR_MATCHES(enc, ptr, '>')) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC; return XML_TOK_EMPTY_ELEMENT_NO_ATTS; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } static int PREFIX(contentTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_NONE; #if MINBPC > 1 { size_t n = end - ptr; if (n & (MINBPC - 1)) { n &= ~(MINBPC - 1); if (n == 0) return XML_TOK_PARTIAL; end = ptr + n; } } #endif switch (BYTE_TYPE(enc, ptr)) { case BT_LT: return PREFIX(scanLt)(enc, ptr + MINBPC, end, nextTokPtr); case BT_AMP: return PREFIX(scanRef)(enc, ptr + MINBPC, end, nextTokPtr); case BT_CR: ptr += MINBPC; if (ptr == end) return XML_TOK_TRAILING_CR; if (BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC; *nextTokPtr = ptr; return XML_TOK_DATA_NEWLINE; case BT_LF: *nextTokPtr = ptr + MINBPC; return XML_TOK_DATA_NEWLINE; case BT_RSQB: ptr += MINBPC; if (ptr == end) return XML_TOK_TRAILING_RSQB; if (!CHAR_MATCHES(enc, ptr, ']')) break; ptr += MINBPC; if (ptr == end) return XML_TOK_TRAILING_RSQB; if (!CHAR_MATCHES(enc, ptr, '>')) { ptr -= MINBPC; break; } *nextTokPtr = ptr; return XML_TOK_INVALID; INVALID_CASES(ptr, nextTokPtr) default: ptr += MINBPC; break; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ if (end - ptr < n || IS_INVALID_CHAR(enc, ptr, n)) { \ *nextTokPtr = ptr; \ return XML_TOK_DATA_CHARS; \ } \ ptr += n; \ break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_RSQB: if (ptr + MINBPC != end) { if (!CHAR_MATCHES(enc, ptr + MINBPC, ']')) { ptr += MINBPC; break; } if (ptr + 2*MINBPC != end) { if (!CHAR_MATCHES(enc, ptr + 2*MINBPC, '>')) { ptr += MINBPC; break; } *nextTokPtr = ptr + 2*MINBPC; return XML_TOK_INVALID; } } /* fall through */ case BT_AMP: case BT_LT: case BT_NONXML: case BT_MALFORM: case BT_TRAIL: case BT_CR: case BT_LF: *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; default: ptr += MINBPC; break; } } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; } /* ptr points to character following "%" */ static int PREFIX(scanPercent)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) case BT_S: case BT_LF: case BT_CR: case BT_PERCNT: *nextTokPtr = ptr; return XML_TOK_PERCENT; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_SEMI: *nextTokPtr = ptr + MINBPC; return XML_TOK_PARAM_ENTITY_REF; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } static int PREFIX(scanPoundName)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { CHECK_NMSTRT_CASES(enc, ptr, end, nextTokPtr) default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_CR: case BT_LF: case BT_S: case BT_RPAR: case BT_GT: case BT_PERCNT: case BT_VERBAR: *nextTokPtr = ptr; return XML_TOK_POUND_NAME; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } static int PREFIX(scanLit)(int open, const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { while (ptr != end) { int t = BYTE_TYPE(enc, ptr); switch (t) { INVALID_CASES(ptr, nextTokPtr) case BT_QUOT: case BT_APOS: ptr += MINBPC; if (t != open) break; if (ptr == end) return XML_TOK_PARTIAL; *nextTokPtr = ptr; switch (BYTE_TYPE(enc, ptr)) { case BT_S: case BT_CR: case BT_LF: case BT_GT: case BT_PERCNT: case BT_LSQB: return XML_TOK_LITERAL; default: return XML_TOK_INVALID; } default: ptr += MINBPC; break; } } return XML_TOK_PARTIAL; } static int PREFIX(prologTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { int tok; if (ptr == end) return XML_TOK_NONE; #if MINBPC > 1 { size_t n = end - ptr; if (n & (MINBPC - 1)) { n &= ~(MINBPC - 1); if (n == 0) return XML_TOK_PARTIAL; end = ptr + n; } } #endif switch (BYTE_TYPE(enc, ptr)) { case BT_QUOT: return PREFIX(scanLit)(BT_QUOT, enc, ptr + MINBPC, end, nextTokPtr); case BT_APOS: return PREFIX(scanLit)(BT_APOS, enc, ptr + MINBPC, end, nextTokPtr); case BT_LT: { ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { case BT_EXCL: return PREFIX(scanDecl)(enc, ptr + MINBPC, end, nextTokPtr); case BT_QUEST: return PREFIX(scanPi)(enc, ptr + MINBPC, end, nextTokPtr); case BT_NMSTRT: case BT_HEX: case BT_NONASCII: case BT_LEAD2: case BT_LEAD3: case BT_LEAD4: *nextTokPtr = ptr - MINBPC; return XML_TOK_INSTANCE_START; } *nextTokPtr = ptr; return XML_TOK_INVALID; } case BT_CR: if (ptr + MINBPC == end) return XML_TOK_TRAILING_CR; /* fall through */ case BT_S: case BT_LF: for (;;) { ptr += MINBPC; if (ptr == end) break; switch (BYTE_TYPE(enc, ptr)) { case BT_S: case BT_LF: break; case BT_CR: /* don't split CR/LF pair */ if (ptr + MINBPC != end) break; /* fall through */ default: *nextTokPtr = ptr; return XML_TOK_PROLOG_S; } } *nextTokPtr = ptr; return XML_TOK_PROLOG_S; case BT_PERCNT: return PREFIX(scanPercent)(enc, ptr + MINBPC, end, nextTokPtr); case BT_COMMA: *nextTokPtr = ptr + MINBPC; return XML_TOK_COMMA; case BT_LSQB: *nextTokPtr = ptr + MINBPC; return XML_TOK_OPEN_BRACKET; case BT_RSQB: ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr, ']')) { if (ptr + MINBPC == end) return XML_TOK_PARTIAL; if (CHAR_MATCHES(enc, ptr + MINBPC, '>')) { *nextTokPtr = ptr + 2*MINBPC; return XML_TOK_COND_SECT_CLOSE; } } *nextTokPtr = ptr; return XML_TOK_CLOSE_BRACKET; case BT_LPAR: *nextTokPtr = ptr + MINBPC; return XML_TOK_OPEN_PAREN; case BT_RPAR: ptr += MINBPC; if (ptr == end) return XML_TOK_PARTIAL; switch (BYTE_TYPE(enc, ptr)) { case BT_AST: *nextTokPtr = ptr + MINBPC; return XML_TOK_CLOSE_PAREN_ASTERISK; case BT_QUEST: *nextTokPtr = ptr + MINBPC; return XML_TOK_CLOSE_PAREN_QUESTION; case BT_PLUS: *nextTokPtr = ptr + MINBPC; return XML_TOK_CLOSE_PAREN_PLUS; case BT_CR: case BT_LF: case BT_S: case BT_GT: case BT_COMMA: case BT_VERBAR: case BT_RPAR: *nextTokPtr = ptr; return XML_TOK_CLOSE_PAREN; } *nextTokPtr = ptr; return XML_TOK_INVALID; case BT_VERBAR: *nextTokPtr = ptr + MINBPC; return XML_TOK_OR; case BT_GT: *nextTokPtr = ptr + MINBPC; return XML_TOK_DECL_CLOSE; case BT_NUM: return PREFIX(scanPoundName)(enc, ptr + MINBPC, end, nextTokPtr); #define LEAD_CASE(n) \ case BT_LEAD ## n: \ if (end - ptr < n) \ return XML_TOK_PARTIAL_CHAR; \ if (IS_NMSTRT_CHAR(enc, ptr, n)) { \ ptr += n; \ tok = XML_TOK_NAME; \ break; \ } \ if (IS_NAME_CHAR(enc, ptr, n)) { \ ptr += n; \ tok = XML_TOK_NMTOKEN; \ break; \ } \ *nextTokPtr = ptr; \ return XML_TOK_INVALID; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_NMSTRT: case BT_HEX: tok = XML_TOK_NAME; ptr += MINBPC; break; case BT_DIGIT: case BT_NAME: case BT_MINUS: tok = XML_TOK_NMTOKEN; ptr += MINBPC; break; case BT_NONASCII: if (IS_NMSTRT_CHAR_MINBPC(enc, ptr)) { ptr += MINBPC; tok = XML_TOK_NAME; break; } if (IS_NAME_CHAR_MINBPC(enc, ptr)) { ptr += MINBPC; tok = XML_TOK_NMTOKEN; break; } /* fall through */ default: *nextTokPtr = ptr; return XML_TOK_INVALID; } while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { CHECK_NAME_CASES(enc, ptr, end, nextTokPtr) case BT_GT: case BT_RPAR: case BT_COMMA: case BT_VERBAR: case BT_LSQB: case BT_PERCNT: case BT_S: case BT_CR: case BT_LF: *nextTokPtr = ptr; return tok; case BT_PLUS: if (tok != XML_TOK_NAME) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC; return XML_TOK_NAME_PLUS; case BT_AST: if (tok != XML_TOK_NAME) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC; return XML_TOK_NAME_ASTERISK; case BT_QUEST: if (tok != XML_TOK_NAME) { *nextTokPtr = ptr; return XML_TOK_INVALID; } *nextTokPtr = ptr + MINBPC; return XML_TOK_NAME_QUESTION; default: *nextTokPtr = ptr; return XML_TOK_INVALID; } } return XML_TOK_PARTIAL; } static int PREFIX(attributeValueTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { const char *start; if (ptr == end) return XML_TOK_NONE; start = ptr; while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: ptr += n; break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_AMP: if (ptr == start) return PREFIX(scanRef)(enc, ptr + MINBPC, end, nextTokPtr); *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_LT: /* this is for inside entity references */ *nextTokPtr = ptr; return XML_TOK_INVALID; case BT_LF: if (ptr == start) { *nextTokPtr = ptr + MINBPC; return XML_TOK_DATA_NEWLINE; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_CR: if (ptr == start) { ptr += MINBPC; if (ptr == end) return XML_TOK_TRAILING_CR; if (BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC; *nextTokPtr = ptr; return XML_TOK_DATA_NEWLINE; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_S: if (ptr == start) { *nextTokPtr = ptr + MINBPC; return XML_TOK_ATTRIBUTE_VALUE_S; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; default: ptr += MINBPC; break; } } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; } static int PREFIX(entityValueTok)(const ENCODING *enc, const char *ptr, const char *end, const char **nextTokPtr) { const char *start; if (ptr == end) return XML_TOK_NONE; start = ptr; while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: ptr += n; break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_AMP: if (ptr == start) return PREFIX(scanRef)(enc, ptr + MINBPC, end, nextTokPtr); *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_PERCNT: if (ptr == start) return PREFIX(scanPercent)(enc, ptr + MINBPC, end, nextTokPtr); *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_LF: if (ptr == start) { *nextTokPtr = ptr + MINBPC; return XML_TOK_DATA_NEWLINE; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; case BT_CR: if (ptr == start) { ptr += MINBPC; if (ptr == end) return XML_TOK_TRAILING_CR; if (BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC; *nextTokPtr = ptr; return XML_TOK_DATA_NEWLINE; } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; default: ptr += MINBPC; break; } } *nextTokPtr = ptr; return XML_TOK_DATA_CHARS; } static int PREFIX(isPublicId)(const ENCODING *enc, const char *ptr, const char *end, const char **badPtr) { ptr += MINBPC; end -= MINBPC; for (; ptr != end; ptr += MINBPC) { switch (BYTE_TYPE(enc, ptr)) { case BT_DIGIT: case BT_HEX: case BT_MINUS: case BT_APOS: case BT_LPAR: case BT_RPAR: case BT_PLUS: case BT_COMMA: case BT_SOL: case BT_EQUALS: case BT_QUEST: case BT_CR: case BT_LF: case BT_SEMI: case BT_EXCL: case BT_AST: case BT_PERCNT: case BT_NUM: break; case BT_S: if (CHAR_MATCHES(enc, ptr, '\t')) { *badPtr = ptr; return 0; } break; case BT_NAME: case BT_NMSTRT: if (!(BYTE_TO_ASCII(enc, ptr) & ~0x7f)) break; default: switch (BYTE_TO_ASCII(enc, ptr)) { case 0x24: /* $ */ case 0x40: /* @ */ break; default: *badPtr = ptr; return 0; } break; } } return 1; } /* This must only be called for a well-formed start-tag or empty element tag. Returns the number of attributes. Pointers to the first attsMax attributes are stored in atts. */ static int PREFIX(getAtts)(const ENCODING *enc, const char *ptr, int attsMax, ATTRIBUTE *atts) { enum { other, inName, inValue } state = inName; int nAtts = 0; int open; for (ptr += MINBPC;; ptr += MINBPC) { switch (BYTE_TYPE(enc, ptr)) { #define START_NAME \ if (state == other) { \ if (nAtts < attsMax) { \ atts[nAtts].name = ptr; \ atts[nAtts].normalized = 1; \ } \ state = inName; \ } #define LEAD_CASE(n) \ case BT_LEAD ## n: START_NAME ptr += (n - MINBPC); break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_NONASCII: case BT_NMSTRT: case BT_HEX: START_NAME break; #undef START_NAME case BT_QUOT: if (state != inValue) { if (nAtts < attsMax) atts[nAtts].valuePtr = ptr + MINBPC; state = inValue; open = BT_QUOT; } else if (open == BT_QUOT) { state = other; if (nAtts < attsMax) atts[nAtts].valueEnd = ptr; nAtts++; } break; case BT_APOS: if (state != inValue) { if (nAtts < attsMax) atts[nAtts].valuePtr = ptr + MINBPC; state = inValue; open = BT_APOS; } else if (open == BT_APOS) { state = other; if (nAtts < attsMax) atts[nAtts].valueEnd = ptr; nAtts++; } break; case BT_AMP: if (nAtts < attsMax) atts[nAtts].normalized = 0; break; case BT_S: if (state == inName) state = other; else if (state == inValue && nAtts < attsMax && atts[nAtts].normalized && (ptr == atts[nAtts].valuePtr || BYTE_TO_ASCII(enc, ptr) != ' ' || BYTE_TO_ASCII(enc, ptr + MINBPC) == ' ' || BYTE_TYPE(enc, ptr + MINBPC) == open)) atts[nAtts].normalized = 0; break; case BT_CR: case BT_LF: /* This case ensures that the first attribute name is counted Apart from that we could just change state on the quote. */ if (state == inName) state = other; else if (state == inValue && nAtts < attsMax) atts[nAtts].normalized = 0; break; case BT_GT: case BT_SOL: if (state != inValue) return nAtts; break; default: break; } } /* not reached */ } static int PREFIX(charRefNumber)(const ENCODING *enc, const char *ptr) { int result = 0; /* skip &# */ ptr += 2*MINBPC; if (CHAR_MATCHES(enc, ptr, 'x')) { for (ptr += MINBPC; !CHAR_MATCHES(enc, ptr, ';'); ptr += MINBPC) { int c = BYTE_TO_ASCII(enc, ptr); switch (c) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': result <<= 4; result |= (c - '0'); break; case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': result <<= 4; result += 10 + (c - 'A'); break; case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': result <<= 4; result += 10 + (c - 'a'); break; } if (result >= 0x110000) return -1; } } else { for (; !CHAR_MATCHES(enc, ptr, ';'); ptr += MINBPC) { int c = BYTE_TO_ASCII(enc, ptr); result *= 10; result += (c - '0'); if (result >= 0x110000) return -1; } } return checkCharRefNumber(result); } static int PREFIX(predefinedEntityName)(const ENCODING *enc, const char *ptr, const char *end) { switch (end - ptr) { case 2 * MINBPC: if (CHAR_MATCHES(enc, ptr + MINBPC, 't')) { switch (BYTE_TO_ASCII(enc, ptr)) { case 'l': return '<'; case 'g': return '>'; } } break; case 3 * MINBPC: if (CHAR_MATCHES(enc, ptr, 'a')) { ptr += MINBPC; if (CHAR_MATCHES(enc, ptr, 'm')) { ptr += MINBPC; if (CHAR_MATCHES(enc, ptr, 'p')) return '&'; } } break; case 4 * MINBPC: switch (BYTE_TO_ASCII(enc, ptr)) { case 'q': ptr += MINBPC; if (CHAR_MATCHES(enc, ptr, 'u')) { ptr += MINBPC; if (CHAR_MATCHES(enc, ptr, 'o')) { ptr += MINBPC; if (CHAR_MATCHES(enc, ptr, 't')) return '"'; } } break; case 'a': ptr += MINBPC; if (CHAR_MATCHES(enc, ptr, 'p')) { ptr += MINBPC; if (CHAR_MATCHES(enc, ptr, 'o')) { ptr += MINBPC; if (CHAR_MATCHES(enc, ptr, 's')) return '\''; } } break; } } return 0; } static int PREFIX(sameName)(const ENCODING *enc, const char *ptr1, const char *ptr2) { for (;;) { switch (BYTE_TYPE(enc, ptr1)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ if (*ptr1++ != *ptr2++) \ return 0; LEAD_CASE(4) LEAD_CASE(3) LEAD_CASE(2) #undef LEAD_CASE /* fall through */ if (*ptr1++ != *ptr2++) return 0; break; case BT_NONASCII: case BT_NMSTRT: case BT_HEX: case BT_DIGIT: case BT_NAME: case BT_MINUS: if (*ptr2++ != *ptr1++) return 0; #if MINBPC > 1 if (*ptr2++ != *ptr1++) return 0; #if MINBPC > 2 if (*ptr2++ != *ptr1++) return 0; #if MINBPC > 3 if (*ptr2++ != *ptr1++) return 0; #endif #endif #endif break; default: #if MINBPC == 1 if (*ptr1 == *ptr2) return 1; #endif switch (BYTE_TYPE(enc, ptr2)) { case BT_LEAD2: case BT_LEAD3: case BT_LEAD4: case BT_NONASCII: case BT_NMSTRT: case BT_HEX: case BT_DIGIT: case BT_NAME: case BT_MINUS: return 0; default: return 1; } } } /* not reached */ } static int PREFIX(nameMatchesAscii)(const ENCODING *enc, const char *ptr1, const char *ptr2) { for (; *ptr2; ptr1 += MINBPC, ptr2++) { if (!CHAR_MATCHES(end, ptr1, *ptr2)) return 0; } switch (BYTE_TYPE(enc, ptr1)) { case BT_LEAD2: case BT_LEAD3: case BT_LEAD4: case BT_NONASCII: case BT_NMSTRT: case BT_HEX: case BT_DIGIT: case BT_NAME: case BT_MINUS: return 0; default: return 1; } } static int PREFIX(nameLength)(const ENCODING *enc, const char *ptr) { const char *start = ptr; for (;;) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: ptr += n; break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_NONASCII: case BT_NMSTRT: case BT_HEX: case BT_DIGIT: case BT_NAME: case BT_MINUS: ptr += MINBPC; break; default: return ptr - start; } } } static const char *PREFIX(skipS)(const ENCODING *enc, const char *ptr) { for (;;) { switch (BYTE_TYPE(enc, ptr)) { case BT_LF: case BT_CR: case BT_S: ptr += MINBPC; break; default: return ptr; } } } static void PREFIX(updatePosition)(const ENCODING *enc, const char *ptr, const char *end, POSITION *pos) { while (ptr != end) { switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ ptr += n; \ break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) #undef LEAD_CASE case BT_LF: pos->columnNumber = (unsigned)-1; pos->lineNumber++; ptr += MINBPC; break; case BT_CR: pos->lineNumber++; ptr += MINBPC; if (ptr != end && BYTE_TYPE(enc, ptr) == BT_LF) ptr += MINBPC; pos->columnNumber = (unsigned)-1; break; default: ptr += MINBPC; break; } pos->columnNumber++; } } #undef DO_LEAD_CASE #undef MULTIBYTE_CASES #undef INVALID_CASES #undef CHECK_NAME_CASE #undef CHECK_NAME_CASES #undef CHECK_NMSTRT_CASE #undef CHECK_NMSTRT_CASES 1.1 apache-1.3/src/lib/expat/xmltok_impl.h Index: xmltok_impl.h =================================================================== /* The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/ Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is expat. The Initial Developer of the Original Code is James Clark. Portions created by James Clark are Copyright (C) 1998 James Clark. All Rights Reserved. Contributor(s): */ enum { BT_NONXML, BT_MALFORM, BT_LT, BT_AMP, BT_RSQB, BT_LEAD2, BT_LEAD3, BT_LEAD4, BT_TRAIL, BT_CR, BT_LF, BT_GT, BT_QUOT, BT_APOS, BT_EQUALS, BT_QUEST, BT_EXCL, BT_SOL, BT_SEMI, BT_NUM, BT_LSQB, BT_S, BT_NMSTRT, BT_HEX, BT_DIGIT, BT_NAME, BT_MINUS, BT_OTHER, /* known not to be a name or name start character */ BT_NONASCII, /* might be a name or name start character */ BT_PERCNT, BT_LPAR, BT_RPAR, BT_AST, BT_PLUS, BT_COMMA, BT_VERBAR }; #include <stddef.h>