Steve,
Thank you so much for the information! It is a huge help in understanding
the results!
I am going to perform similar tests using the JAVA API.
Rob
On 2019/11/08 16:36:46, Steve Lawrence <[email protected]> wrote:
> Hi Rob,
>
> I don't think you are subscribed to the dev list. I'd recommend you
> subscribe so you don't miss any responses if someone forgets to reply all.
>
> Before answering your questions, I'll reiterate that the -N option says
> how many times to repeat the parse, the -t option says how many threads
> to use. So the performance command will parse the test_file.csv file N
> times. If t is not 1, it will parallelize those N parses across t threads.
>
> Total parse time is just the wall clock time from the time the first
> parse starts to the time the last parse finishes. So in your example it
> took about 2.4 seconds to parse test_file.csv 100 times with 5 threads.
>
> The min rate is determined by finding the parse that took the longest of
> those N parses and calculating how fast you could parse if it always
> took that long. This is just 1 / longest_parse_time.
>
> The max rate is the same, but uses the shortest parse time.
>
> The average rate is could really be thought of as throughput. This is
> calculated by taking the total number_of_parses / wall_clock_time. This
> can essentially be thought of as throughput. Usually
> increasing/decreasing the number of threads will slow down the max rate
> but increase the average rate, since more threads generally means more
> throughput (until it doesn't and makes things worse ;),
>
> Note that min and max rates are often very different (usually orders of
> magnitude) because the first bunch of parses take a while for the Java
> do JIT compiling and optimizations. So the max rate is what you are more
> likely to see once in production once the JVM is warmed up and a bunch
> of parses have completed.
>
> And you can see this in your results. Parsing 200 files gave you a max
> rate of 107 files/second, but parsing 1000 gave you a max of 300. At
> some point, as you increase the number of files the max rate will stop
> getting better, which essentially means the JVM is fully warm and
> optimized, and that's the fastest rate you'll get.
>
> As to wheather you numbers are good or not, they don't seem particularly
> good. On my laptop, with the csv schema and a small csv file, at about
> -N 100,000 files and a single thread I get to around 12000 files/sec, so
> many times faster than yours.
>
> Unfortunately, I don't have any suggestions specific to ARM. I will say
> that Daffodil can be pretty memory hungry, so usually giving more memory
> to the JVM helps performance. But the Daffodil CLI defaults to 1024MB,
> so you might not be able to bump it much more with your limited RAM.
>
> The other suggestion I have is to decrease the number of threads and see
> how things improve. Some libraries we use in Daffodil are not thread
> safe, so we have to use things like ThreadLocals which likely incurs
> some overhead and slows things down.
>
> - Steve
>
>
>
>
> On 11/8/19 10:45 AM, Rose, Rob P wrote:
> > All,
> >
> > I am trying to port the Apache daffodil libraries onto an
> > cross
> > domain guard that runs in a very small form factor.
> >
> > We have cross compiled OpenJDK 12 for the aarch64 (ARM
> > processor) and loaded into memory.
> >
> > I have built the source using sbt (sbt daffodil-cli/stage)
> > and
> > loaded the necessary jars into memory on the board.
> >
> > Here are some of the specifics of the hardware platform
> > running
> > on this guard:
> >
> > ·2 GB DDR RAM
> >
> > oMemory Management Unit (MMU) Page Tables used in this system are
> > one-to-one
> > mapping.
> >
> > ·ARM Cortex A53 4 Core Processor
> >
> > Here are some the specifics for the software components
> >
> > ·SELinux
> >
> > ·Busybox
> >
> > Here is some of the performance numbers we are seeing from the performance
> > testing:
> >
> > *NOTE: These tests were run using the attached csv file and the attached
> > schema*
> >
> > **
> >
> > **
> >
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
> >
> > total parse time (sec): 2.443824
> >
> > ·What does the total parse time value mean ?
> >
> > ·How is it calculated ?
> >
> > ·Is this poor performance?
> >
> > min rate (files/sec): 1.535568
> >
> > ·What is the min rate (files/sec) What does this mean ?
> >
> > max rate (files/sec): 29.460340
> >
> > ·What is the max rate (files/sec) What does this mean ?
> >
> > avg rate (files/sec): 40.919485
> >
> > ·What is the avg rate (files/sec) What does this mean ?
> >
> > ·Do you have any suggestions how to improve parse/unparsed speed on an ARM
> > processor?
> >
> > ·Any suggestions are greatly appreciated!
> >
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
> >
> > total parse time (sec): 3.175893
> >
> > min rate (files/sec): 1.520884
> >
> > max rate (files/sec): 107.223428
> >
> > avg rate (files/sec): 62.974409
> >
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
> >
> > total parse time (sec): 3.656587
> >
> > min rate (files/sec): 1.551273
> >
> > max rate (files/sec): 180.155186
> >
> > avg rate (files/sec): 82.043712
> >
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5
> > demo/test_file.csv
> >
> > total parse time (sec): 5.602554
> >
> > min rate (files/sec): 1.459977
> >
> > max rate (files/sec): 301.144046
> >
> > avg rate (files/sec): 178.490026
> >
> > Sincerely,
> >
> > Rob Rose
> >
> > Sr. Principal Software Engineer
> >
> > General Dynamics Mission Systems
> >
> > Office: 508-880-1866
> >
> > Cell: 508-341-5216
> >
> > /This message and/or attachments may include information subject to GD
> > Corporate
> > Policies 07-103 and 07-105 and is intended to be accessed only by
> > authorized
> > recipients. Use, storage and transmission are governed by General Dynamics
> > and
> > its policies. Contractual restrictions apply to third parties. Recipients
> > should refer to the policies or contract to determine proper handling.
> > Unauthorized review, use, disclosure or distribution is prohibited. If you
> > are
> > not an intended recipient, please contact the sender and destroy all copies
> > of
> > the original message./
> >
>
>