[jira] [Updated] (XERCESC-2054) Grammar serialization not portable (integer size / alignment issue)

Scott Cantor (JIRA) Wed, 12 Jul 2017 09:43:28 -0700

     [ 
https://issues.apache.org/jira/browse/XERCESC-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Scott Cantor updated XERCESC-2054:
----------------------------------
    Remaining Estimate:     (was: 4h)
     Original Estimate:     (was: 4h)

> Grammar serialization not portable (integer size / alignment issue)
> -------------------------------------------------------------------
>
>                 Key: XERCESC-2054
>                 URL: https://issues.apache.org/jira/browse/XERCESC-2054
>             Project: Xerces-C++
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 
> 3.1.4
>         Environment: Linux CentOS-7 (64bit), Windows 7 (64bit)
>            Reporter: Oliver Moeller
>
> Apologies if this is a known issue, but I have not found it by conventional
> means (i.e., google an searching through the bug data base here).
> I found that the serialisation/deserialisation (here: of grammars) is not as 
> portable as it (IMHO) should be.
> The problem happens in XSerializeEngine::readString() when
> the length of the string is taken from the associated BinInputStream as
> "unsigned long":
>     /***
>      * Check if any data written
>      ***/
>     unsigned long tmp;
>     *this>>tmp;
> On a Windows7 x64, MSVS2012, this will take 4 byte off the head of the stream,
> but on a CentOS 7 x64 (g++ 4.8.3), this will take 8 byte.
> As a consequence, a BinInputStream carefully encoded on Windows (e.g. putting
> it into a char array with
>   examples/cxx/tree/embedded/grammar-input-stream.cxx
> which is a common xsd example)
> will fail when "reading" it on the Linux box, because everything from the 
> first
> string on is garbage.
> Moreover, this will (probably) give no meaningful error message, just a
> "XSerialisationException" thrown, cause at some point it will (probably)
> misinterpret wchar data as length information and try to read the next string
> that is millions of bytes long (according to the misunderstood 
> BinInputStream).
> The BinInputStream will then run out of bytes.
> A similar issue is present concerning the *alignment* of the data according 
> to data type that happens for all >> operations: this is (necessarily) very
> platform dependent.
> It would be a big improvement, if xerces would encode the (de)serialization
> in a platform/compiler independent manner. The purpose after all *IS* to be 
> portable, right?
> E.g., the serialisation engine could always use integers of known byte width
> (e.g.: #include <inttypes.h> -> use uint32_t) instead of "unsigned long".
> ALso, the alignment issue should be addressed; it is hard to predict
> what restrictions apply for the used compiler (or even processor) here, some 
> are not capable to read an integer from a memory address that is not 4-byte 
> aligned.
> E.g., the data could be copied (to a properly aligned item initialized by 0s)
> before doing the cast to an integer type.
> In any case, it should always be platform-independent how many bytes are next 
> to be read from the BinaryInputStream.
> (Of course, the write operations have to follow the same business logic.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (XERCESC-2054) Grammar serialization not portable (integer size / alignment issue)

Reply via email to