Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Steve Lawrence Fri, 25 Jan 2019 12:25:21 -0800

The behavior of the  test suite depends on the defaultRoundTrip and
roundTrip attributes in the tdml:testSuite and tdml:parserTestCase
elements. The different values and how they affect round trip testing
are described at the end of our TDML page [1].


What you describe we call roundTrip="onePass", which is the same as
"true". It looks like you have defaultRoundTrip="true", so each TDML
test is parsed and compared with the expected infoset. If they match,
the infoset is "unparsed" (serialized) and compared with the original
input data. If either the infoset or unparsed data does not exactly
match, the TDML runner should cause a test failure.

It's exciting to see the progress you're making! Your schemas look
really well written--for sure a good example of how to write DFDL
schemas. I'm definitely looking forward to seeing things progress with
the PLC4X project.

- Steve

[1] https://daffodil.apache.org/tdml/#round-trip-testing

On 1/25/19 11:26 AM, Christofer Dutz wrote:
> Hi Steve,
> 
> was a busy two days for me ... but now I got to come back to the fun stuff.
> 
> So I guess now I was able to finish both the s7 schema as well as the 
> test-suite.
> I added the byte data of several packet captures and the parsing seems to be 
> doing its job nicely.
> 
> It even helped diagnose a bug in our code by being able to adjust the format 
> to another assumption and checking if it worked.
> 
> And thanks for your patience and continued assistance with this. But I think 
> this is going to be a huge thing for PLC4X :-)
> 
> Regarding the performance question ... I got the numbers form the test-suite 
> execution ... here every parsing operation is done exactly once.
> So guess it's not quite representative.
> 
> One question however:
> How does a test in the testsuite work ... does it take the binary input, 
> parse that and compare it with the XML version and then take the XML version 
> and serialize it and compare to the byte version?
> Cause initially I got errors while parsing and later I once got an error 
> "Nope" when "unparsing" (Guess that's Serializing) ... would be great to know 
> if it does that as this way I would feel much more 
> Confident it's doing 100% what I want.
> 
> My next step would be to generate a new version of the S7 driver, that 
> utilizes Daffodil for the serialization and deserialization ... then I'll 
> probably do some benchmarks and compare to the hand-written code.
> 
> Nevertheless I think this will be a great way to implement new protocols as 
> it's simply a lot faster to write such a schema (if you know how to do that).
> 
> Thanks again to you all,
> Chris
> 
> 
> 
> Am 22.01.19, 23:02 schrieb "Steve Lawrence" <[email protected]>:
> 
>     If merged schemas allow you to access other fields to calculate the
>     length of the userData field instead of using delimited hexBinary, I
>     suspect you would see a noticeable performance increase.
>     
>     Delimited hexBinary is implemented as encoding the input bytes into
>     ISO-8859-1 characters and building up a string until a delimiter or end
>     of data is found. The resulting string is then decoded to get the hex
>     binary byte array. It's not terribly slow, but is inefficient compared
>     to how we normally get hexBinary bytes with an explicit length. In the
>     explicit length case, we know exactly how many bits to read and can read
>     the source bytes directly into a hexBinary array, avoiding all the
>     encoding/decoding/delimiter scanning complexity.
>     
>     - Steve
>     
>     On 1/22/19 3:48 PM, Christofer Dutz wrote:
>     > Hi Steve
>     > 
>     > Yup ... couldn't wait till tomorrow and yes ... 
>     > your option worked (Wonder what I had different)
>     > 
>     > Performance-wise ... would it be better to join the schemas?
>     > 
>     > As I will always parse all 3 schemas and use them for serialization.
>     > I could imagine a merged schema (where I can for example get the 
>     > length for COTP from the KPKT and use that for the userData)
>     > 
>     > Chris
>     > 
>     > 
>     > Am 22.01.19, 18:44 schrieb "Steve Lawrence" <[email protected]>:
>     > 
>     >     Yep, I think hexBinay with dfdl:lengthKind="delimited" should work 
> for
>     >     your case. I've modified the userData element to look like this:
>     >     
>     >       <xs:element name="userData" type="xs:hexBinary"
>     >         dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
>     >         dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />
>     >     
>     >     This will cause the userData field to consume all data until the 
> end of
>     >     the input. Note that delimited hexBinary is treated like string 
> data, so
>     >     the encoding and textTrimKind properties need to be specified--it 
> might
>     >     make sense to move them to the cotpFormat.
>     >     
>     >     I'm guessing the test you're talking about is "scenarioDataTpdu". 
> With
>     >     the above change to the schema and using the data from that test:
>     >     
>     >       02F080320700000300000800080001120411440100ff09000401320004
>     >     
>     >     The resulting infoset is:
>     >     
>     >       <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp";>
>     >         <headerLength>2</headerLength>
>     >         <type>240</type>
>     >         <cotp:CotpTpduData>
>     >           <endOfTransmission>1</endOfTransmission>
>     >           <tpduRef>0</tpduRef>
>     >         </cotp:CotpTpduData>
>     >     
> <userData>320700000300000800080001120411440100FF09000401320004</userData>
>     >       </cotp:CoTpTPDU>
>     >     
>     >     Three bytes total are consumed for the headerLength, type, and
>     >     CotPTpduData field, and the remaining bytes end up in the userData 
> field
>     >     as hexBinary. If there is no remaining data in the input, then the
>     >     <userData> element is just empty (i.e. <userData />).
>     >     
>     >     - Steve
>     >     
>     >     
>     >     
>     >     On 1/22/19 11:58 AM, Christofer Dutz wrote:
>     >     > Hi Steve,
>     >     > 
>     >     > The code is in the plc4x repo I posted several times now. 
> Unfortunately I'm 
>     >     > sitting in a train without my laptop. It's the COTP protocol. 
> There's a matching 
>     >     > tdml test with commented out binary payload. That's what I'm 
> trying to read.
>     >     > 
>     >     > Could probably post the links some time this evening.
>     >     > 
>     >     > Chris
>     >     > 
>     >     > Outlook für Android <https://aka.ms/ghei36> herunterladen
>     >     > 
>     >     > 
> --------------------------------------------------------------------------------
>     >     > *From:* Steve Lawrence <[email protected]>
>     >     > *Sent:* Tuesday, January 22, 2019 5:17:24 PM
>     >     > *To:* [email protected]; Christofer Dutz
>     >     > *Subject:* Re: How to achieve lengthKind=”endOfParent” without 
> using endOfParent?
>     >     > There isn't a concept of a global length of input since some 
> inputs
>     >     > could be streaming and so we don't actually know the length until 
> the
>     >     > end of data is reached.
>     >     > 
>     >     > I guess it isn't clear to me what your data looks like. I /think/
>     >     > delimited hexBinary should work. If the parent element does not 
> have a
>     >     > length, delimited hex binary should consume all available data up 
> until
>     >     > the end. Could you provide a little more detail on what your data 
> looks
>     >     > like (e.g. what has a known lengths, headers, user data, etc.)
>     >     > 
>     >     > As far as implementing lengthKind="prefixed", I don't think the 
> current
>     >     > Daffodil devs have the resources to implement endOfParent right 
> now.
>     >     > Most of us are focused on other tasks at the moment. Tough, it's
>     >     > definitely possible to implement it--there aren't any real 
> technical
>     >     > limitations that I know of with the current code base--but it 
> probably
>     >     > would be a decent amount of work and would be an ambitious tasks 
> for a
>     >     > first time Daffodil contributor. Such a feature touches a lot of
>     >     > different parts of Daffodil so there's a lot to learn. We're more 
> than
>     >     > happy to provide guidance if you do want to contribute this 
> feature, and
>     >     > it probably could be done in reasonably sized chunks, but I'd 
> first want
>     >     > to confirm that there isn't an alternative.
>     >     > 
>     >     > - Steve
>     >     > 
>     >     > 
>     >     > On 1/22/19 10:35 AM, Christofer Dutz wrote:
>     >     >> Hi Steve,
>     >     >> 
>     >     >> well the problem is that I don't have the parent length in the 
> current context.
>     >     >> 
>     >     >> Without it, it doesn't seem to work.
>     >     >> 
>     >     >> If there was some sort of global variable providing the total 
> length of the entire input, that would be awesome.
>     >     >> As I mentioned, the length information in in the surrounding 
> protocol, I wanted to model them all as separate as possible.
>     >     >> 
>     >     >> Would it be possible to implement lengthKind="endOfParent"? 
> Would it be a lot of work? Could I help with it?
>     >     >> 
>     >     >> Chris
>     >     >> 
>     >     >> 
>     >     >> 
>     >     >> Am 22.01.19, 15:48 schrieb "Steve Lawrence" 
> <[email protected]>:
>     >     >> 
>     >     >>     Correct, lengthKind="endOfParent" has not bee implemented 
> yet.
>     >     >>     
>     >     >>     As an alternative that we do support, you should be able to 
> use
>     >     >>     dfdl:lengthKind="delimited" for the hexBinary user data. In 
> this case,
>     >     >>     there's no delimiter, but parent length sort of acts like 
> one. For example:
>     >     >>     
>     >     >>       <xs:element name="Parent"
>     >     >>         dfdl:lengthKind="explicit" dfdl:length="4"
>     >     >>         dfdl:lengthUnits="bytes">
>     >     >>         <xs:complexType>
>     >     >>           <xs:sequence>
>     >     >>             <xs:element name="Header" type="xs:hexBinary"
>     >     >>               dfdl:lengthKind="explicit" dfdl:length="1"
>     >     >>               dfdl:lengthUnits="bytes" />
>     >     >>             <xs:element name="UserData" type="xs:hexBinary"
>     >     >>               dfdl:lengthKind="delimited" 
> dfdl:encoding="ISO-8859-1"/>
>     >     >>           </xs:sequence>
>     >     >>         </xs:complexType>
>     >     >>       </xs:element>
>     >     >>     
>     >     >>     So the parent element is 4 bytes and the header is 1 byte. 
> If we parse
>     >     >>     the data:
>     >     >>     
>     >     >>       0xAA BB CC DD
>     >     >>     
>     >     >>     We get the following infoset
>     >     >>     
>     >     >>       <Parent>
>     >     >>         <Header>AA</Header>
>     >     >>         <UserData>BBCCDD</UserData>
>     >     >>       </Parent>
>     >     >>     
>     >     >>     And the UserData is the remaining three bytes. Using
>     >     >>     lengthKind="endOfParent" would probably have better 
> performance if we
>     >     >>     implemented it, but this should give the same result for the 
> hexBinary
>     >     >>     blob at the end.
>     >     >>     
>     >     >>     - Steve
>     >     >>     
>     >     >>     
>     >     >>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>     >     >>     > Hi all,
>     >     >>     > 
>     >     >>     > I am stuck with a little problem … I am reading a packet, 
> which is usually contained inside another. Therefore it doesn’t provide any 
> means of providing it’s length.
>     >     >>     > So the packet is just a small header + binary data … now I 
> want to read “all the rest” after the header into a field “userData”.
>     >     >>     > In the DFDL documentation at IBM I could read that the 
> lengthKind=”endOfParent” would be what I’m looking for.
>     >     >>     > 
>     >     >>     > Unfortunately this doesn’t seem to be supported … so how 
> can I achieve the same with implemented options?
>     >     >>     > 
>     >     >>     > Chris
>     >     >>     > 
>     >     >>     
>     >     >>     
>     >     >> 
>     >     > 
>     >     
>     >     
>     > 
>     
>     
>

Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Reply via email to