Steve,
     Thank you so much for the information!  It is a huge help in understanding 
the results!

      I am going to perform similar tests using the JAVA API.
Rob
On 2019/11/08 16:36:46, Steve Lawrence <slawre...@apache.org> wrote: 
> Hi Rob,
> 
> I don't think you are subscribed to the dev list. I'd recommend you
> subscribe so you don't miss any responses if someone forgets to reply all.
> 
> Before answering your questions, I'll reiterate that the -N option says
> how many times to repeat the parse, the -t option says how many threads
> to use. So the performance command will parse the test_file.csv file N
> times. If t is not 1, it will parallelize those N parses across t threads.
> 
> Total parse time is just the wall clock time from the time the first
> parse starts to the time the last parse finishes. So in your example it
> took about 2.4 seconds to parse test_file.csv 100 times with 5 threads.
> 
> The min rate is determined by finding the parse that took the longest of
> those N parses and calculating how fast you could parse if it always
> took that long. This is just 1 / longest_parse_time.
> 
> The max rate is the same, but uses the shortest parse time.
> 
> The average rate is could really be thought of as throughput. This is
> calculated by taking the total number_of_parses / wall_clock_time. This
> can essentially be thought of as throughput. Usually
> increasing/decreasing the number of threads will slow down the max rate
> but increase the average rate, since more threads generally means more
> throughput (until it doesn't and makes things worse ;),
> 
> Note that min and max rates are often very different (usually orders of
> magnitude) because the first bunch of parses take a while for the Java
> do JIT compiling and optimizations. So the max rate is what you are more
> likely to see once in production once the JVM is warmed up and a bunch
> of parses have completed.
> 
> And you can see this in your results. Parsing 200 files gave you a max
> rate of 107 files/second, but parsing 1000 gave you a max of 300. At
> some point, as you increase the number of files the max rate will stop
> getting better, which essentially means the JVM is fully warm and
> optimized, and that's the fastest rate you'll get.
> 
> As to wheather you numbers are good or not, they don't seem particularly
> good. On my laptop, with the csv schema and a small csv file, at about
> -N 100,000 files and a single thread I get to around 12000 files/sec, so
> many times faster than yours.
> 
> Unfortunately, I don't have any suggestions specific to ARM. I will say
> that Daffodil can be pretty memory hungry, so usually giving more memory
> to the JVM helps performance. But the Daffodil CLI defaults to 1024MB,
> so you might not be able to bump it much more with your limited RAM.
> 
> The other suggestion I have is to decrease the number of threads and see
> how things improve. Some libraries we use in Daffodil are not thread
> safe, so we have to use things like ThreadLocals which likely incurs
> some overhead and slows things down.
> 
> - Steve
> 
> 
> 
> 
> On 11/8/19 10:45 AM, Rose, Rob P wrote:
> > All,
> > 
> >                  I am trying to port the Apache daffodil libraries onto an 
> > cross 
> > domain guard that runs in a very small form factor.
> > 
> >                  We have cross compiled OpenJDK 12 for the aarch64 (ARM 
> > processor) and loaded into memory.
> > 
> >                  I have built the source using sbt (sbt daffodil-cli/stage) 
> > and 
> > loaded the necessary jars into memory on the board.
> > 
> >                  Here are some of the specifics of the hardware platform 
> > running 
> > on this guard:
> > 
> > ·2 GB DDR RAM
> > 
> > oMemory Management Unit (MMU) Page Tables used in this system are 
> > one-to-one 
> > mapping.
> > 
> > ·ARM Cortex A53 4 Core Processor
> > 
> > Here are some the specifics for the software components
> > 
> > ·SELinux
> > 
> > ·Busybox
> > 
> > Here is some of the performance numbers we are seeing from the performance 
> > testing:
> > 
> > *NOTE:  These tests were run using the attached csv file and the attached 
> > schema*
> > 
> > **
> > 
> > **
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 100 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 2.443824
> > 
> > ·What does the total parse time value mean ?
> > 
> > ·How is it calculated ?
> > 
> > ·Is this poor performance?
> > 
> > min rate (files/sec): 1.535568
> > 
> > ·What is the min rate (files/sec)  What does this mean ?
> > 
> > max rate (files/sec): 29.460340
> > 
> > ·What is the max rate (files/sec)  What does this mean ?
> > 
> > avg rate (files/sec): 40.919485
> > 
> > ·What is the avg rate (files/sec)  What does this mean ?
> > 
> > ·Do you have any suggestions how to improve parse/unparsed speed on an ARM 
> > processor?
> > 
> > ·Any suggestions are greatly appreciated!
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 200 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 3.175893
> > 
> > min rate (files/sec): 1.520884
> > 
> > max rate (files/sec): 107.223428
> > 
> > avg rate (files/sec): 62.974409
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 300 -t 5 demo/test_file.csv
> > 
> > total parse time (sec): 3.656587
> > 
> > min rate (files/sec): 1.551273
> > 
> > max rate (files/sec): 180.155186
> > 
> > avg rate (files/sec): 82.043712
> > 
> > # ./daffodil performance -s demo/csv.dfdl.xsd -N 1000 -t 5 
> > demo/test_file.csv
> > 
> > total parse time (sec): 5.602554
> > 
> > min rate (files/sec): 1.459977
> > 
> > max rate (files/sec): 301.144046
> > 
> > avg rate (files/sec): 178.490026
> > 
> > Sincerely,
> > 
> > Rob Rose
> > 
> > Sr. Principal Software Engineer
> > 
> > General Dynamics Mission Systems
> > 
> > Office: 508-880-1866
> > 
> > Cell:      508-341-5216
> > 
> > /This message and/or attachments may include information subject to GD 
> > Corporate 
> > Policies 07-103 and 07-105 and is intended to be accessed only by 
> > authorized 
> > recipients.  Use, storage and transmission are governed by General Dynamics 
> > and 
> > its policies. Contractual restrictions apply to third parties.  Recipients 
> > should refer to the policies or contract to determine proper handling.  
> > Unauthorized review, use, disclosure or distribution is prohibited.  If you 
> > are 
> > not an intended recipient, please contact the sender and destroy all copies 
> > of 
> > the original message./
> > 
> 
> 

Reply via email to