mbeckerle commented on code in PR #1603:
URL: https://github.com/apache/daffodil/pull/1603#discussion_r2627152505


##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/NilParsers.scala:
##########
@@ -40,21 +40,40 @@ abstract class LiteralNilOfSpecifiedLengthParserBase(erd: 
ElementRuntimeData)
   def isFieldNilLit(field: String): Boolean
 
   override def parse(start: PState): Unit = {
-
-    val field = parseString(start)
-
-    val isFieldEmpty = field.length() == 0
-
-    if (isFieldEmpty && isEmptyAllowed) {
-      // Valid! Success ParseResult indicates nilled
-    } else if (isFieldEmpty && !isEmptyAllowed) {
-      // Fail!
-      PE(start, "%s - Empty field found but not allowed!", eName)
-    } else if (isFieldNilLit(field)) {
-      // Contains a nilValue, Success ParseResult indicates nilled
+    if (erd.isComplexType) {
+      // nillable complex types must have a nilValue of %ES;. For a literal 
nil specified length
+      // complex to be nilled, that means either there must be a specified 
length that is zero
+      // or there isn't a specified length and we have reached the end of the 
data. If neither
+      // of these conditions are true, then there is non-empty data for this 
complex element and
+      // it cannot be nilled.
+      val bitLimit0b = start.bitLimit0b
+      val hasSpecifiedLength = bitLimit0b.isDefined
+      if (
+        (hasSpecifiedLength && (bitLimit0b.get - start.bitPos0b) > 0) ||
+        (!hasSpecifiedLength && start.dataInputStream.hasData)
+      ) {
+        // Fail!
+        PE(start, "%s - Does not contain a nil literal!", eName)

Review Comment:
   I suggest remove the "!" exclamation point. No need for this emphasis. 



##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/StringLengthParsers.scala:
##########
@@ -86,8 +87,25 @@ trait StringOfSpecifiedLengthMixin extends 
PaddingRuntimeMixin with CaptureParsi
 
   protected final def parseString(start: PState): String = {
     val dis = start.dataInputStream
-    val maxLen = start.tunable.maximumSimpleElementSizeInCharacters
     val startBitPos0b = dis.bitPos0b
+    val bitLimit0b = dis.bitLimit0b
+
+    // We want to limit the maximum length passed into getSomeString since 
that function can
+    // pre-allocate a buffer that size even if it won't find that many 
characters. So we
+    // calculate the maximum number of characters that we could possibly 
decode based on the
+    // number of a available bits and the character set. Note that some 
character sets have
+    // variable width encodings, so we can't always know for sure how many 
characters will be
+    // found. But we can calculate a maximum based on the smallest possible 
code unit for the

Review Comment:
   Feels backward to me. Don't you need the maximum width so you can allocate 
larger when there are variable-width characters possible and the string may 
contain N of those? I.e., so for UTF8 you want to allow the worst-case of 
4-bytes per character, not 1?



##########
daffodil-core/src/main/scala/org/apache/daffodil/io/LocalBuffer.scala:
##########
@@ -33,7 +33,13 @@ abstract class LocalBuffer[T <: java.nio.Buffer] {
   def getBuf(length: Long) = {
     Assert.usage(length <= Int.MaxValue)
     if (tempBuf.isEmpty || tempBuf.get.capacity < length) {
-      tempBuf = Maybe(allocate(length.toInt))
+      // allocate a buffer that can store the required length, but with a 
minimum size. The
+      // majority of LocalBuffers should be smaller than this minimum size and 
so should avoid
+      // costly reallocations, while still being small enough that the JVM 
should have no
+      // problem quickly allocating it
+      val minBufferSize = 1024
+      val allocationSize = math.max(length.toInt, minBufferSize)
+      tempBuf = Maybe(allocate(allocationSize))

Review Comment:
   I am not getting why this improves performance. 
   
   It sounds to me like the reuse mechanism in this code was not working since 
if it was, your improvement here would not have helped, and that suggests that 
there is more improvement on the table if we get the reuse of these buffers to 
work better, or work at all, since I suspect it's not working at all. 
   
   I suggest: first take out all the reuse - just allocate and let the GC 
reclaim exactly-sized buffers and see if that's faster/slower. (checkpoint on 
the reuse mechanism overhead vs. just allocate+GC)
   
   Second, instrument the reuse of these buffers so we count exactly how many 
of these are allocated vs. reused from the stack/pool. My bet is the reuse 
isn't working. 



##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/NilParsers.scala:
##########
@@ -40,21 +40,40 @@ abstract class LiteralNilOfSpecifiedLengthParserBase(erd: 
ElementRuntimeData)
   def isFieldNilLit(field: String): Boolean
 
   override def parse(start: PState): Unit = {
-
-    val field = parseString(start)
-
-    val isFieldEmpty = field.length() == 0
-
-    if (isFieldEmpty && isEmptyAllowed) {
-      // Valid! Success ParseResult indicates nilled
-    } else if (isFieldEmpty && !isEmptyAllowed) {
-      // Fail!
-      PE(start, "%s - Empty field found but not allowed!", eName)
-    } else if (isFieldNilLit(field)) {
-      // Contains a nilValue, Success ParseResult indicates nilled
+    if (erd.isComplexType) {
+      // nillable complex types must have a nilValue of %ES;. For a literal 
nil specified length

Review Comment:
   Code looks fine. 
   
   I just ask does logic about '%ES;' and complex nil values exist elsewhere as 
well? 
   I ask because there is a longstanding enhancement request for DFDL to allow 
more than just '%ES;' as nil values for complex types.
    
   The DFDL Workgroup was uncertain where this restriction originated, but some 
members suspected there *is* a reason for it that has just been lost to 
history. Hence, if this feature were to be added it would be experimental until 
proven ok.
   
   In any case this code is fine as it was already doing complex type nil 
logic, so it's not like you added another place to the code that reasons about 
this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to