Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Steve Lawrence Wed, 23 Jan 2019 11:57:24 -0800

Yep, that's a reasonable way to handle what would otherwise be an empty
choice branch. Zero length numbers will cause a failure as you saw. An
alternative might be to use xs:hexBinary with dfdl:length="0" for the
S7RequestPayloadReadVar element. You aren't allowed to have
intergers/shorts/bytes/etc. with zero lengths, but you are allowed to
have zero length hexBinary data. Then you do not need the minOccurs or
the complex type. Makes things just a little simpler.



As far as performance goes, I'm seeing about what I would expect. To
test, I used the data in the tpktPacketContainingCotpConnectResponse
test in tpkit-protocol.tdml. I tested performance with 300,000
iterations. I also needed to subtract 4 from the occursCount expression
in the CotpMessageType to account for the Tkip header. Not sure if the
data in that test isn't meant to work in S7full, but I'm not sure it
would affect performance since the packet doesn't have any S7 data (I
think).

Parsing this data with just the TPKT schema, parse maxes out at about
179000 parses/second, or 1.676 seconds.

Extracting out the COPT from that data (i.e. removing the first four
bytes), I can parse using the COPT schema about 45700 times/second, or
6.565 seconds. Quite a bit slower, but COPT is more complex so not
totally unexpected.

The combined time is about 8.241 seconds.

Parsing the original data with the S7-full schema, I get about 42400
times/second, or 7.075 seconds. So it's a little slower than just the
COPT but faster than the combined time. Which I think makes sense. COPT
doesn't need to parse the TPKT header like S7-full does, so S7-full
should be slower. But S7-full also doesn't double count the userdata
parse time, which is what combining the TPKT+COPT times effectively
does, so it should be faster than the combined times.

My guess is that maybe your JVM just isn't warmed up enough? I think I
needed to get above 100,000 iterations before reaching the maximum parse
speed.

FYI, to get these numbers I used to daffodil performance subcommand in
the CLI, e.g.

  daffodil performance -N 300000 -s schemaPath testData.bin

- Steve

On 1/23/19 12:12 PM, Christofer Dutz wrote:
> Hi all,
> 
> ok so I solved this one myself.
> 
> During my search I stumbled over DFDL-1355 
> (https://opensource.ncsa.illinois.edu/jira/browse/DFDL-1355)
> Where they say: " DFDL spec says "The Root of the Branch MUST NOT be 
> optional. That is XSDL minOccurs MUST BE greater than 0.""
> 
> So I thought: Oh well, then just let me create a complex type element with a 
> sequence of one empty element and that seems to have solved this problem:
> 
> So now my type looks like this:
> 
>     <xs:element name="S7RequestPayloadReadVar">
>         <xs:complexType>
>             <xs:sequence>
>                 <xs:element name="payload" type="s7f:byte" minOccurs="0"/>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
> 
> And now it's working :-)
> 
> 
> Chris
> 
> 
> 
> Am 23.01.19, 17:25 schrieb "Christofer Dutz" <[email protected]>:
> 
>     Hi Steve,
>     
>     Now I've created a merged version of my 3 schemas in order to see if the 
> performance is better.
>     I did notice, that if I run tests that parsing usually takes about twice 
> as long in the merged schema.
>     The tests are running for inputs targeting the first 2 levels and I know 
> that now if parsing a level-2 input
>     Im parsing a TPKT packet with included COTP payload so I'm actually 
> parsing two levels, however if 
>     I add the parsing time of TPKT and add that of simple COTP the sum is 
> quite a bit lower than that of 
>     The combined schema. What could be causing this?
>     
>     And I ran into some problems again :/ ... in my S7 Schema I have the case 
> where I need to output an (empty) payload element to match a parameter 
> element.
>     
>     Unfortunately doing this:
>     
>         <xs:element name="S7RequestPayloadReadVar" type="xs:byte" 
> dfdl:lengthKind="explicit" dfdl:length="0"/>
>     
>     Doesn't seem to work and I get the following error:
>     
>     Expression Evaluation Error: Element s7f:S7RequestPayloadReadVar does not 
> have a value.
>     Schema context: element reference s7f:S7RequestPayloadReadVar Location 
> line 423 column 46 in 
> file:/Users/christofer.dutz/Projects/Apache/PLC4X/protocols/target/classes/org/apache/plc4x/protocols/s7-full-stack-protocol.dfdl.xsd
>     
>     How can I achieve this?
>     
>     Chris
>     
>     
>     
>     
>     Am 22.01.19, 23:02 schrieb "Steve Lawrence" <[email protected]>:
>     
>         If merged schemas allow you to access other fields to calculate the
>         length of the userData field instead of using delimited hexBinary, I
>         suspect you would see a noticeable performance increase.
>         
>         Delimited hexBinary is implemented as encoding the input bytes into
>         ISO-8859-1 characters and building up a string until a delimiter or 
> end
>         of data is found. The resulting string is then decoded to get the hex
>         binary byte array. It's not terribly slow, but is inefficient compared
>         to how we normally get hexBinary bytes with an explicit length. In the
>         explicit length case, we know exactly how many bits to read and can 
> read
>         the source bytes directly into a hexBinary array, avoiding all the
>         encoding/decoding/delimiter scanning complexity.
>         
>         - Steve
>         
>         On 1/22/19 3:48 PM, Christofer Dutz wrote:
>         > Hi Steve
>         > 
>         > Yup ... couldn't wait till tomorrow and yes ... 
>         > your option worked (Wonder what I had different)
>         > 
>         > Performance-wise ... would it be better to join the schemas?
>         > 
>         > As I will always parse all 3 schemas and use them for serialization.
>         > I could imagine a merged schema (where I can for example get the 
>         > length for COTP from the KPKT and use that for the userData)
>         > 
>         > Chris
>         > 
>         > 
>         > Am 22.01.19, 18:44 schrieb "Steve Lawrence" <[email protected]>:
>         > 
>         >     Yep, I think hexBinay with dfdl:lengthKind="delimited" should 
> work for
>         >     your case. I've modified the userData element to look like this:
>         >     
>         >       <xs:element name="userData" type="xs:hexBinary"
>         >         dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
>         >         dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />
>         >     
>         >     This will cause the userData field to consume all data until 
> the end of
>         >     the input. Note that delimited hexBinary is treated like string 
> data, so
>         >     the encoding and textTrimKind properties need to be 
> specified--it might
>         >     make sense to move them to the cotpFormat.
>         >     
>         >     I'm guessing the test you're talking about is 
> "scenarioDataTpdu". With
>         >     the above change to the schema and using the data from that 
> test:
>         >     
>         >       02F080320700000300000800080001120411440100ff09000401320004
>         >     
>         >     The resulting infoset is:
>         >     
>         >       <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp";>
>         >         <headerLength>2</headerLength>
>         >         <type>240</type>
>         >         <cotp:CotpTpduData>
>         >           <endOfTransmission>1</endOfTransmission>
>         >           <tpduRef>0</tpduRef>
>         >         </cotp:CotpTpduData>
>         >     
> <userData>320700000300000800080001120411440100FF09000401320004</userData>
>         >       </cotp:CoTpTPDU>
>         >     
>         >     Three bytes total are consumed for the headerLength, type, and
>         >     CotPTpduData field, and the remaining bytes end up in the 
> userData field
>         >     as hexBinary. If there is no remaining data in the input, then 
> the
>         >     <userData> element is just empty (i.e. <userData />).
>         >     
>         >     - Steve
>         >     
>         >     
>         >     
>         >     On 1/22/19 11:58 AM, Christofer Dutz wrote:
>         >     > Hi Steve,
>         >     > 
>         >     > The code is in the plc4x repo I posted several times now. 
> Unfortunately I'm 
>         >     > sitting in a train without my laptop. It's the COTP protocol. 
> There's a matching 
>         >     > tdml test with commented out binary payload. That's what I'm 
> trying to read.
>         >     > 
>         >     > Could probably post the links some time this evening.
>         >     > 
>         >     > Chris
>         >     > 
>         >     > Outlook für Android <https://aka.ms/ghei36> herunterladen
>         >     > 
>         >     > 
> --------------------------------------------------------------------------------
>         >     > *From:* Steve Lawrence <[email protected]>
>         >     > *Sent:* Tuesday, January 22, 2019 5:17:24 PM
>         >     > *To:* [email protected]; Christofer Dutz
>         >     > *Subject:* Re: How to achieve lengthKind=”endOfParent” 
> without using endOfParent?
>         >     > There isn't a concept of a global length of input since some 
> inputs
>         >     > could be streaming and so we don't actually know the length 
> until the
>         >     > end of data is reached.
>         >     > 
>         >     > I guess it isn't clear to me what your data looks like. I 
> /think/
>         >     > delimited hexBinary should work. If the parent element does 
> not have a
>         >     > length, delimited hex binary should consume all available 
> data up until
>         >     > the end. Could you provide a little more detail on what your 
> data looks
>         >     > like (e.g. what has a known lengths, headers, user data, etc.)
>         >     > 
>         >     > As far as implementing lengthKind="prefixed", I don't think 
> the current
>         >     > Daffodil devs have the resources to implement endOfParent 
> right now.
>         >     > Most of us are focused on other tasks at the moment. Tough, 
> it's
>         >     > definitely possible to implement it--there aren't any real 
> technical
>         >     > limitations that I know of with the current code base--but it 
> probably
>         >     > would be a decent amount of work and would be an ambitious 
> tasks for a
>         >     > first time Daffodil contributor. Such a feature touches a lot 
> of
>         >     > different parts of Daffodil so there's a lot to learn. We're 
> more than
>         >     > happy to provide guidance if you do want to contribute this 
> feature, and
>         >     > it probably could be done in reasonably sized chunks, but I'd 
> first want
>         >     > to confirm that there isn't an alternative.
>         >     > 
>         >     > - Steve
>         >     > 
>         >     > 
>         >     > On 1/22/19 10:35 AM, Christofer Dutz wrote:
>         >     >> Hi Steve,
>         >     >> 
>         >     >> well the problem is that I don't have the parent length in 
> the current context.
>         >     >> 
>         >     >> Without it, it doesn't seem to work.
>         >     >> 
>         >     >> If there was some sort of global variable providing the 
> total length of the entire input, that would be awesome.
>         >     >> As I mentioned, the length information in in the surrounding 
> protocol, I wanted to model them all as separate as possible.
>         >     >> 
>         >     >> Would it be possible to implement lengthKind="endOfParent"? 
> Would it be a lot of work? Could I help with it?
>         >     >> 
>         >     >> Chris
>         >     >> 
>         >     >> 
>         >     >> 
>         >     >> Am 22.01.19, 15:48 schrieb "Steve Lawrence" 
> <[email protected]>:
>         >     >> 
>         >     >>     Correct, lengthKind="endOfParent" has not bee 
> implemented yet.
>         >     >>     
>         >     >>     As an alternative that we do support, you should be able 
> to use
>         >     >>     dfdl:lengthKind="delimited" for the hexBinary user data. 
> In this case,
>         >     >>     there's no delimiter, but parent length sort of acts 
> like one. For example:
>         >     >>     
>         >     >>       <xs:element name="Parent"
>         >     >>         dfdl:lengthKind="explicit" dfdl:length="4"
>         >     >>         dfdl:lengthUnits="bytes">
>         >     >>         <xs:complexType>
>         >     >>           <xs:sequence>
>         >     >>             <xs:element name="Header" type="xs:hexBinary"
>         >     >>               dfdl:lengthKind="explicit" dfdl:length="1"
>         >     >>               dfdl:lengthUnits="bytes" />
>         >     >>             <xs:element name="UserData" type="xs:hexBinary"
>         >     >>               dfdl:lengthKind="delimited" 
> dfdl:encoding="ISO-8859-1"/>
>         >     >>           </xs:sequence>
>         >     >>         </xs:complexType>
>         >     >>       </xs:element>
>         >     >>     
>         >     >>     So the parent element is 4 bytes and the header is 1 
> byte. If we parse
>         >     >>     the data:
>         >     >>     
>         >     >>       0xAA BB CC DD
>         >     >>     
>         >     >>     We get the following infoset
>         >     >>     
>         >     >>       <Parent>
>         >     >>         <Header>AA</Header>
>         >     >>         <UserData>BBCCDD</UserData>
>         >     >>       </Parent>
>         >     >>     
>         >     >>     And the UserData is the remaining three bytes. Using
>         >     >>     lengthKind="endOfParent" would probably have better 
> performance if we
>         >     >>     implemented it, but this should give the same result for 
> the hexBinary
>         >     >>     blob at the end.
>         >     >>     
>         >     >>     - Steve
>         >     >>     
>         >     >>     
>         >     >>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>         >     >>     > Hi all,
>         >     >>     > 
>         >     >>     > I am stuck with a little problem … I am reading a 
> packet, which is usually contained inside another. Therefore it doesn’t 
> provide any means of providing it’s length.
>         >     >>     > So the packet is just a small header + binary data … 
> now I want to read “all the rest” after the header into a field “userData”.
>         >     >>     > In the DFDL documentation at IBM I could read that the 
> lengthKind=”endOfParent” would be what I’m looking for.
>         >     >>     > 
>         >     >>     > Unfortunately this doesn’t seem to be supported … so 
> how can I achieve the same with implemented options?
>         >     >>     > 
>         >     >>     > Chris
>         >     >>     > 
>         >     >>     
>         >     >>     
>         >     >> 
>         >     > 
>         >     
>         >     
>         > 
>         
>         
>     
>     
>

Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Reply via email to