stevedlawrence commented on code in PR #1603:
URL: https://github.com/apache/daffodil/pull/1603#discussion_r2627353381


##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/StringLengthParsers.scala:
##########
@@ -86,8 +87,25 @@ trait StringOfSpecifiedLengthMixin extends 
PaddingRuntimeMixin with CaptureParsi
 
   protected final def parseString(start: PState): String = {
     val dis = start.dataInputStream
-    val maxLen = start.tunable.maximumSimpleElementSizeInCharacters
     val startBitPos0b = dis.bitPos0b
+    val bitLimit0b = dis.bitLimit0b
+
+    // We want to limit the maximum length passed into getSomeString since 
that function can
+    // pre-allocate a buffer that size even if it won't find that many 
characters. So we
+    // calculate the maximum number of characters that we could possibly 
decode based on the
+    // number of a available bits and the character set. Note that some 
character sets have
+    // variable width encodings, so we can't always know for sure how many 
characters will be
+    // found. But we can calculate a maximum based on the smallest possible 
code unit for the

Review Comment:
   The code isn't calculating the maximum width, it's calculating the maximum 
number of characters that the available number of bits/bytes could decode to. 
With UTF8, we get the maximum number of characters with the minimum single 
character length. So N bytes could potentially decode to N characters. It might 
decode to less characters if there are any multibte chars, but it can't 
possible decode to more than N characters.
   
   I'll update this comment to make this more clear.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to