[ https://issues.apache.org/jira/browse/XERCESC-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Scott Cantor updated XERCESC-2054: ---------------------------------- Remaining Estimate: (was: 4h) Original Estimate: (was: 4h) > Grammar serialization not portable (integer size / alignment issue) > ------------------------------------------------------------------- > > Key: XERCESC-2054 > URL: https://issues.apache.org/jira/browse/XERCESC-2054 > Project: Xerces-C++ > Issue Type: Bug > Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, > 3.1.4 > Environment: Linux CentOS-7 (64bit), Windows 7 (64bit) > Reporter: Oliver Moeller > > Apologies if this is a known issue, but I have not found it by conventional > means (i.e., google an searching through the bug data base here). > I found that the serialisation/deserialisation (here: of grammars) is not as > portable as it (IMHO) should be. > The problem happens in XSerializeEngine::readString() when > the length of the string is taken from the associated BinInputStream as > "unsigned long": > /*** > * Check if any data written > ***/ > unsigned long tmp; > *this>>tmp; > On a Windows7 x64, MSVS2012, this will take 4 byte off the head of the stream, > but on a CentOS 7 x64 (g++ 4.8.3), this will take 8 byte. > As a consequence, a BinInputStream carefully encoded on Windows (e.g. putting > it into a char array with > examples/cxx/tree/embedded/grammar-input-stream.cxx > which is a common xsd example) > will fail when "reading" it on the Linux box, because everything from the > first > string on is garbage. > Moreover, this will (probably) give no meaningful error message, just a > "XSerialisationException" thrown, cause at some point it will (probably) > misinterpret wchar data as length information and try to read the next string > that is millions of bytes long (according to the misunderstood > BinInputStream). > The BinInputStream will then run out of bytes. > A similar issue is present concerning the *alignment* of the data according > to data type that happens for all >> operations: this is (necessarily) very > platform dependent. > It would be a big improvement, if xerces would encode the (de)serialization > in a platform/compiler independent manner. The purpose after all *IS* to be > portable, right? > E.g., the serialisation engine could always use integers of known byte width > (e.g.: #include <inttypes.h> -> use uint32_t) instead of "unsigned long". > ALso, the alignment issue should be addressed; it is hard to predict > what restrictions apply for the used compiler (or even processor) here, some > are not capable to read an integer from a memory address that is not 4-byte > aligned. > E.g., the data could be copied (to a properly aligned item initialized by 0s) > before doing the cast to an integer type. > In any case, it should always be platform-independent how many bytes are next > to be read from the BinaryInputStream. > (Of course, the write operations have to follow the same business logic.) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org