stevedlawrence commented on code in PR #1603:
URL: https://github.com/apache/daffodil/pull/1603#discussion_r2627353381
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/StringLengthParsers.scala:
##########
@@ -86,8 +87,25 @@ trait StringOfSpecifiedLengthMixin extends
PaddingRuntimeMixin with CaptureParsi
protected final def parseString(start: PState): String = {
val dis = start.dataInputStream
- val maxLen = start.tunable.maximumSimpleElementSizeInCharacters
val startBitPos0b = dis.bitPos0b
+ val bitLimit0b = dis.bitLimit0b
+
+ // We want to limit the maximum length passed into getSomeString since
that function can
+ // pre-allocate a buffer that size even if it won't find that many
characters. So we
+ // calculate the maximum number of characters that we could possibly
decode based on the
+ // number of a available bits and the character set. Note that some
character sets have
+ // variable width encodings, so we can't always know for sure how many
characters will be
+ // found. But we can calculate a maximum based on the smallest possible
code unit for the
Review Comment:
The code isn't calculating the maximum width, it's calculating the maximum
number of characters that the available number of bits/bytes could decode to.
With UTF8, we get the maximum number of characters with the minimum single
character length. So N bytes could potentially decode to N characters. It might
decode to less characters if there are any multibte chars, but it can't
possible decode to more than N characters.
I'll update this comment to make this more clear.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]