Hi Rob,

I don't think you are subscribed to the dev list. I'd recommend you
subscribe so you don't miss any responses if someone forgets to reply all.

Before answering your questions, I'll reiterate that the -N option says
how many times to repeat the parse, the -t option says how many threads
to use. So the performance command will parse the test_file.csv file N
times. If t is not 1, it will parallelize those N parses across t threads.

Total parse time is just the wall clock time from the time the first
parse starts to the time the last parse finishes. So in your example it
took about 2.4 seconds to parse test_file.csv 100 times with 5 threads.

The min rate is determined by finding the parse that took the longest of
those N parses and calculating how fast you could parse if it always
took that long. This is just 1 / longest_parse_time.

The max rate is the same, but uses the shortest parse time.

The average rate is could really be thought of as throughput. This is
calculated by taking the total number_of_parses / wall_clock_time. This
can essentially be thought of as throughput. Usually
increasing/decreasing the number of threads will slow down the max rate
but increase the average rate, since more threads generally means more
throughput (until it doesn't and makes things worse ;),

Note that min and max rates are often very different (usually orders of
magnitude) because the first bunch of parses take a while for the Java
do JIT compiling and optimizations. So the max rate is what you are more
likely to see once in production once the JVM is warmed up and a bunch
of parses have completed.

And you can see this in your results. Parsing 200 files gave you a max
rate of 107 files/second, but parsing 1000 gave you a max of 300. At
some point, as you increase the number of files the max rate will stop
getting better, which essentially means the JVM is fully warm and
optimized, and that's the fastest rate you'll get.

As to wheather you numbers are good or not, they don't seem particularly
good. On my laptop, with the csv schema and a small csv file, at about
-N 100,000 files and a single thread I get to around 12000 files/sec, so
many times faster than yours.

Unfortunately, I don't have any suggestions specific to ARM. I will say
that Daffodil can be pretty memory hungry, so usually giving more memory
to the JVM helps performance. But the Daffodil CLI defaults to 1024MB,
so you might not be able to bump it much more with your limited RAM.

The other suggestion I have is to decrease the number of threads and see
how things improve. Some libraries we use in Daffodil are not thread
safe, so we have to use things like ThreadLocals which likely incurs
some overhead and slows things down.

- Steve




On 11/8/19 10:45 AM, Rose, Rob P wrote:
> All,
> 
>                  I am trying to port the Apache daffodil libraries onto an 
> cross 
> domain guard that runs in a very small form factor.
> 
>                  We have cross compiled OpenJDK 12 for the aarch64 (ARM 
> processor) and loaded into memory.
> 
>                  I have built the source using sbt (sbt daffodil-cli/stage) 
> and 
> loaded the necessary jars into memory on the board.
> 
>                  Here are some of the specifics of the hardware platform 
> running 
> on this guard:
> 
> ·2 GB DDR RAM
> 
> oMemory Management Unit (MMU) Page Tables used in this system are one-to-one 
> mapping.
> 
> ·ARM Cortex A53 4 Core Processor
> 
> Here are some the specifics for the software components
> 
> ·SELinux
> 
> ·Busybox
> 
> Here is some of the performance numbers we are seeing from the performance 
> testing:
> 
> *NOTE:  These tests were run using the attached csv file and the attached 
> schema*
> 
> **
> 
> **
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
> 
> total parse time (sec): 2.443824
> 
> ·What does the total parse time value mean ?
> 
> ·How is it calculated ?
> 
> ·Is this poor performance?
> 
> min rate (files/sec): 1.535568
> 
> ·What is the min rate (files/sec)  What does this mean ?
> 
> max rate (files/sec): 29.460340
> 
> ·What is the max rate (files/sec)  What does this mean ?
> 
> avg rate (files/sec): 40.919485
> 
> ·What is the avg rate (files/sec)  What does this mean ?
> 
> ·Do you have any suggestions how to improve parse/unparsed speed on an ARM 
> processor?
> 
> ·Any suggestions are greatly appreciated!
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
> 
> total parse time (sec): 3.175893
> 
> min rate (files/sec): 1.520884
> 
> max rate (files/sec): 107.223428
> 
> avg rate (files/sec): 62.974409
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
> 
> total parse time (sec): 3.656587
> 
> min rate (files/sec): 1.551273
> 
> max rate (files/sec): 180.155186
> 
> avg rate (files/sec): 82.043712
> 
> # ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 demo/test_file.csv
> 
> total parse time (sec): 5.602554
> 
> min rate (files/sec): 1.459977
> 
> max rate (files/sec): 301.144046
> 
> avg rate (files/sec): 178.490026
> 
> Sincerely,
> 
> Rob Rose
> 
> Sr. Principal Software Engineer
> 
> General Dynamics Mission Systems
> 
> Office: 508-880-1866
> 
> Cell:      508-341-5216
> 
> /This message and/or attachments may include information subject to GD 
> Corporate 
> Policies 07-103 and 07-105 and is intended to be accessed only by authorized 
> recipients.  Use, storage and transmission are governed by General Dynamics 
> and 
> its policies. Contractual restrictions apply to third parties.  Recipients 
> should refer to the policies or contract to determine proper handling.  
> Unauthorized review, use, disclosure or distribution is prohibited.  If you 
> are 
> not an intended recipient, please contact the sender and destroy all copies 
> of 
> the original message./
> 

Reply via email to