mbeckerle commented on code in PR #1603:
URL: https://github.com/apache/daffodil/pull/1603#discussion_r2627152505
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/NilParsers.scala:
##########
@@ -40,21 +40,40 @@ abstract class LiteralNilOfSpecifiedLengthParserBase(erd:
ElementRuntimeData)
def isFieldNilLit(field: String): Boolean
override def parse(start: PState): Unit = {
-
- val field = parseString(start)
-
- val isFieldEmpty = field.length() == 0
-
- if (isFieldEmpty && isEmptyAllowed) {
- // Valid! Success ParseResult indicates nilled
- } else if (isFieldEmpty && !isEmptyAllowed) {
- // Fail!
- PE(start, "%s - Empty field found but not allowed!", eName)
- } else if (isFieldNilLit(field)) {
- // Contains a nilValue, Success ParseResult indicates nilled
+ if (erd.isComplexType) {
+ // nillable complex types must have a nilValue of %ES;. For a literal
nil specified length
+ // complex to be nilled, that means either there must be a specified
length that is zero
+ // or there isn't a specified length and we have reached the end of the
data. If neither
+ // of these conditions are true, then there is non-empty data for this
complex element and
+ // it cannot be nilled.
+ val bitLimit0b = start.bitLimit0b
+ val hasSpecifiedLength = bitLimit0b.isDefined
+ if (
+ (hasSpecifiedLength && (bitLimit0b.get - start.bitPos0b) > 0) ||
+ (!hasSpecifiedLength && start.dataInputStream.hasData)
+ ) {
+ // Fail!
+ PE(start, "%s - Does not contain a nil literal!", eName)
Review Comment:
I suggest remove the "!" exclamation point. No need for this emphasis.
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/StringLengthParsers.scala:
##########
@@ -86,8 +87,25 @@ trait StringOfSpecifiedLengthMixin extends
PaddingRuntimeMixin with CaptureParsi
protected final def parseString(start: PState): String = {
val dis = start.dataInputStream
- val maxLen = start.tunable.maximumSimpleElementSizeInCharacters
val startBitPos0b = dis.bitPos0b
+ val bitLimit0b = dis.bitLimit0b
+
+ // We want to limit the maximum length passed into getSomeString since
that function can
+ // pre-allocate a buffer that size even if it won't find that many
characters. So we
+ // calculate the maximum number of characters that we could possibly
decode based on the
+ // number of a available bits and the character set. Note that some
character sets have
+ // variable width encodings, so we can't always know for sure how many
characters will be
+ // found. But we can calculate a maximum based on the smallest possible
code unit for the
Review Comment:
Feels backward to me. Don't you need the maximum width so you can allocate
larger when there are variable-width characters possible and the string may
contain N of those? I.e., so for UTF8 you want to allow the worst-case of
4-bytes per character, not 1?
##########
daffodil-core/src/main/scala/org/apache/daffodil/io/LocalBuffer.scala:
##########
@@ -33,7 +33,13 @@ abstract class LocalBuffer[T <: java.nio.Buffer] {
def getBuf(length: Long) = {
Assert.usage(length <= Int.MaxValue)
if (tempBuf.isEmpty || tempBuf.get.capacity < length) {
- tempBuf = Maybe(allocate(length.toInt))
+ // allocate a buffer that can store the required length, but with a
minimum size. The
+ // majority of LocalBuffers should be smaller than this minimum size and
so should avoid
+ // costly reallocations, while still being small enough that the JVM
should have no
+ // problem quickly allocating it
+ val minBufferSize = 1024
+ val allocationSize = math.max(length.toInt, minBufferSize)
+ tempBuf = Maybe(allocate(allocationSize))
Review Comment:
I am not getting why this improves performance.
It sounds to me like the reuse mechanism in this code was not working since
if it was, your improvement here would not have helped, and that suggests that
there is more improvement on the table if we get the reuse of these buffers to
work better, or work at all, since I suspect it's not working at all.
I suggest: first take out all the reuse - just allocate and let the GC
reclaim exactly-sized buffers and see if that's faster/slower. (checkpoint on
the reuse mechanism overhead vs. just allocate+GC)
Second, instrument the reuse of these buffers so we count exactly how many
of these are allocated vs. reused from the stack/pool. My bet is the reuse
isn't working.
##########
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/NilParsers.scala:
##########
@@ -40,21 +40,40 @@ abstract class LiteralNilOfSpecifiedLengthParserBase(erd:
ElementRuntimeData)
def isFieldNilLit(field: String): Boolean
override def parse(start: PState): Unit = {
-
- val field = parseString(start)
-
- val isFieldEmpty = field.length() == 0
-
- if (isFieldEmpty && isEmptyAllowed) {
- // Valid! Success ParseResult indicates nilled
- } else if (isFieldEmpty && !isEmptyAllowed) {
- // Fail!
- PE(start, "%s - Empty field found but not allowed!", eName)
- } else if (isFieldNilLit(field)) {
- // Contains a nilValue, Success ParseResult indicates nilled
+ if (erd.isComplexType) {
+ // nillable complex types must have a nilValue of %ES;. For a literal
nil specified length
Review Comment:
Code looks fine.
I just ask does logic about '%ES;' and complex nil values exist elsewhere as
well?
I ask because there is a longstanding enhancement request for DFDL to allow
more than just '%ES;' as nil values for complex types.
The DFDL Workgroup was uncertain where this restriction originated, but some
members suspected there *is* a reason for it that has just been lost to
history. Hence, if this feature were to be added it would be experimental until
proven ok.
In any case this code is fine as it was already doing complex type nil
logic, so it's not like you added another place to the code that reasons about
this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]