Brandon,
Thank you so much for the useful information! It is a huge help!
I have a follow up question:
You mention " I would suggest either using daffodil in stream mode, or
using it as a library as part of a long-lived process "
By stream mode, do you mean using the JAVA API to not
the CLI implementation? Meaning using Input/Output streams instead of command
line interface?
Second question:
The "parse time" time vaue, is that measured as the amount of
time it takes to compile the parser, parse the data according to the schema,
and output the data to the console or file?
Thanks so much again!
Rob
-----Original Message-----
From: Sloane, Brandon <[email protected]>
Sent: Friday, November 8, 2019 11:35 AM
To: [email protected]
Cc: Hanna, Maria <[email protected]>
Subject: Re: CLI Performance usage...
I am not familiar with how daffodil's performance stats are reported
(particularly how the average rate is faster then the max rate).
However, the biggest bottlenecks for Daffodil performance is schema
compilation. If performance is a concern, I would recommend pre-compiling your
parser using the `daffodil save-parser` command. You can then use the
pre-compiled parser using the '-P' flag instead of '-s'. Note that Daffodil
does not have a stable format for pre-compiled parsers, so the Daffodil version
used to save the parser would need to match the version used to run it.
A similar issue (which wouldn't be captured by daffodil performance) is startup
time. Since Daffodil runs on the JVM, just starting it takes a substantial
amount of time (`time daffodil --help` is about 800ms on my development
system). On your actual system, I would suggest either using daffodil in stream
mode, or using it as a library as part of a long-lived process. If you do
either of these, them pre-compiling would help reduce your startup time, but
would not offer any additional benefits to throughput.
________________________________
From: Rose, Rob P <[email protected]>
Sent: Friday, November 8, 2019 10:45 AM
To: [email protected] <[email protected]>
Cc: Hanna, Maria <[email protected]>
Subject: CLI Performance usage...
All,
I am trying to port the Apache daffodil libraries onto an cross
domain guard that runs in a very small form factor.
We have cross compiled OpenJDK 12 for the aarch64 (ARM
processor) and loaded into memory.
I have built the source using sbt (sbt daffodil-cli/stage) and
loaded the necessary jars into memory on the board.
Here are some of the specifics of the hardware platform running
on this guard:
* 2 GB DDR RAM
o Memory Management Unit (MMU) Page Tables used in this system are one-to-one
mapping.
* ARM Cortex A53 4 Core Processor
Here are some the specifics for the software components
* SELinux
* Busybox
Here is some of the performance numbers we are seeing from the performance
testing:
NOTE: These tests were run using the attached csv file and the
attached schema
# ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
total parse time (sec): 2.443824
* What does the total parse time value mean ?
* How is it calculated ?
* Is this poor performance?
min rate (files/sec): 1.535568
* What is the min rate (files/sec) What does this mean ?
max rate (files/sec): 29.460340
* What is the max rate (files/sec) What does this mean ?
avg rate (files/sec): 40.919485
* What is the avg rate (files/sec) What does this mean ?
* Do you have any suggestions how to improve parse/unparsed speed on an
ARM processor?
* Any suggestions are greatly appreciated!
# ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
total parse time (sec): 3.175893
min rate (files/sec): 1.520884
max rate (files/sec): 107.223428
avg rate (files/sec): 62.974409
# ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
total parse time (sec): 3.656587
min rate (files/sec): 1.551273
max rate (files/sec): 180.155186
avg rate (files/sec): 82.043712
# ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv
total parse time (sec): 5.602554
min rate (files/sec): 1.459977
max rate (files/sec): 301.144046
avg rate (files/sec): 178.490026
Sincerely,
Rob Rose
Sr. Principal Software Engineer
General Dynamics Mission Systems
Office: 508-880-1866
Cell: 508-341-5216
This message and/or attachments may include information subject to GD Corporate
Policies 07-103 and 07-105 and is intended to be accessed only by authorized
recipients. Use, storage and transmission are governed by General Dynamics and
its policies. Contractual restrictions apply to third parties. Recipients
should refer to the policies or contract to determine proper handling.
Unauthorized review, use, disclosure or distribution is prohibited. If you are
not an intended recipient, please contact the sender and destroy all copies of
the original message.