Hi Rob, I don't think you are subscribed to the dev list. I'd recommend you subscribe so you don't miss any responses if someone forgets to reply all.
Before answering your questions, I'll reiterate that the -N option says how many times to repeat the parse, the -t option says how many threads to use. So the performance command will parse the test_file.csv file N times. If t is not 1, it will parallelize those N parses across t threads. Total parse time is just the wall clock time from the time the first parse starts to the time the last parse finishes. So in your example it took about 2.4 seconds to parse test_file.csv 100 times with 5 threads. The min rate is determined by finding the parse that took the longest of those N parses and calculating how fast you could parse if it always took that long. This is just 1 / longest_parse_time. The max rate is the same, but uses the shortest parse time. The average rate is could really be thought of as throughput. This is calculated by taking the total number_of_parses / wall_clock_time. This can essentially be thought of as throughput. Usually increasing/decreasing the number of threads will slow down the max rate but increase the average rate, since more threads generally means more throughput (until it doesn't and makes things worse ;), Note that min and max rates are often very different (usually orders of magnitude) because the first bunch of parses take a while for the Java do JIT compiling and optimizations. So the max rate is what you are more likely to see once in production once the JVM is warmed up and a bunch of parses have completed. And you can see this in your results. Parsing 200 files gave you a max rate of 107 files/second, but parsing 1000 gave you a max of 300. At some point, as you increase the number of files the max rate will stop getting better, which essentially means the JVM is fully warm and optimized, and that's the fastest rate you'll get. As to wheather you numbers are good or not, they don't seem particularly good. On my laptop, with the csv schema and a small csv file, at about -N 100,000 files and a single thread I get to around 12000 files/sec, so many times faster than yours. Unfortunately, I don't have any suggestions specific to ARM. I will say that Daffodil can be pretty memory hungry, so usually giving more memory to the JVM helps performance. But the Daffodil CLI defaults to 1024MB, so you might not be able to bump it much more with your limited RAM. The other suggestion I have is to decrease the number of threads and see how things improve. Some libraries we use in Daffodil are not thread safe, so we have to use things like ThreadLocals which likely incurs some overhead and slows things down. - Steve On 11/8/19 10:45 AM, Rose, Rob P wrote: > All, > > I am trying to port the Apache daffodil libraries onto an > cross > domain guard that runs in a very small form factor. > > We have cross compiled OpenJDK 12 for the aarch64 (ARM > processor) and loaded into memory. > > I have built the source using sbt (sbt daffodil-cli/stage) > and > loaded the necessary jars into memory on the board. > > Here are some of the specifics of the hardware platform > running > on this guard: > > ·2 GB DDR RAM > > oMemory Management Unit (MMU) Page Tables used in this system are one-to-one > mapping. > > ·ARM Cortex A53 4 Core Processor > > Here are some the specifics for the software components > > ·SELinux > > ·Busybox > > Here is some of the performance numbers we are seeing from the performance > testing: > > *NOTE: These tests were run using the attached csv file and the attached > schema* > > ** > > ** > > # ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv > > total parse time (sec): 2.443824 > > ·What does the total parse time value mean ? > > ·How is it calculated ? > > ·Is this poor performance? > > min rate (files/sec): 1.535568 > > ·What is the min rate (files/sec) What does this mean ? > > max rate (files/sec): 29.460340 > > ·What is the max rate (files/sec) What does this mean ? > > avg rate (files/sec): 40.919485 > > ·What is the avg rate (files/sec) What does this mean ? > > ·Do you have any suggestions how to improve parse/unparsed speed on an ARM > processor? > > ·Any suggestions are greatly appreciated! > > # ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv > > total parse time (sec): 3.175893 > > min rate (files/sec): 1.520884 > > max rate (files/sec): 107.223428 > > avg rate (files/sec): 62.974409 > > # ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv > > total parse time (sec): 3.656587 > > min rate (files/sec): 1.551273 > > max rate (files/sec): 180.155186 > > avg rate (files/sec): 82.043712 > > # ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv > > total parse time (sec): 5.602554 > > min rate (files/sec): 1.459977 > > max rate (files/sec): 301.144046 > > avg rate (files/sec): 178.490026 > > Sincerely, > > Rob Rose > > Sr. Principal Software Engineer > > General Dynamics Mission Systems > > Office: 508-880-1866 > > Cell: 508-341-5216 > > /This message and/or attachments may include information subject to GD > Corporate > Policies 07-103 and 07-105 and is intended to be accessed only by authorized > recipients. Use, storage and transmission are governed by General Dynamics > and > its policies. Contractual restrictions apply to third parties. Recipients > should refer to the policies or contract to determine proper handling. > Unauthorized review, use, disclosure or distribution is prohibited. If you > are > not an intended recipient, please contact the sender and destroy all copies > of > the original message./ >