Re: [Liblas-devel] Performance hit from 1.21 to 1.6b3

Howard Butler Fri, 17 Dec 2010 09:23:23 -0800

On Dec 17, 2010, at 10:16 AM, Mauricio Terneus wrote:

> Anyone notice a performance hit going from 1.21 to 1.6b3?
>  
> With 1.21 I can load roughly 10 million points in 3 seconds using c api
> In 1.6b3 I load the same data set in 5 seconds using the c++ api


There have been some changes that will negatively affect performance, 
especially if things aren't done in ways that I have expected.  I'll outline 
what those changes were, and some things you can do to hopefully mitigate them. 
 I suspect there are also more places we might be able to squeeze more juice 
out of the lemon, as it were.

libLAS 1.2.1 and below utilized a liblas::Point that was kind of fat.  It 
carried around interpreted data members for all of the dimensions on the point 
-- x, y, z, intensity, etc -- and if you asked for one of these, it just 
returned it to you directly.  The interpretation of those data happened as the 
data were read, and again as the data were written (back into raw bytes). 

libLAS 1.6+ has changed liblas::Point in a number of important ways.  
liblas::Point now only carries along the raw bytes for the point, and if you 
ask for one of the dimensions, it interprets it on-the-fly.  For example, a 
GetX() call now requires going into the liblas::Point byte array, pulling the 
first four bytes off of it, asking the point's header for scaling information, 
and rescaling the integer data into double data.  As you only call GetX() one 
time, things are roughly equivalent to what we were doing before -- 
interpreting and caching interpreted data directly on the liblas::Point -- but 
every one of your subsequent calls to GetX() have this interpretation 
performance hit.  You need to cache your calls to interpreted data if you are 
reusing things.

The rationale for moving to this approach was three-fold.  First, the LAS 
committee continually adds new dimensions onto the specification, and I wanted 
an extendable way to add them to libLAS without causing a full re-engineering 
of the liblas::Point class every time they do. Second, liblas::Point now has a 
schema attached to it (based on the list of dimensions that a LAS file's point 
format defines plus any custom dimensions you wish to add to the point record). 
 The schema allows you to extend the point format and add your own dimensions 
and it provides generic descriptive information about what exists in the file.  
You can see the description of these schemas in the new lasinfo output from 
libLAS.  Lastly, previous versions of libLAS did not allow you to work with raw 
data, and did not allow the user to transform the data (coordinate data, 
especially) with perfect fidelity.  The new approach explicitly supports this 
out-of-the-box.  Here's something that is now possible with the new (C++) API 
that was not previously:

liblas::Point const& p = reader.GetPoint();
std::vector<uint8_t> data = p.GetData();
... // do something with the raw data like stuff it into a database.

One area of the code that I suspect is performance sensitive, but I haven't 
done a lot of profiling of, is the portions of liblas::Point that determine 
where in the point's byte array to fetch the dimension from.  Currently, for 
fixed-schema point formats, this is likely doing unnecessary work (the point 
formats are fixed and cannot change position).  I will look into that and see 
if my raw i/o timings match yours and if I can improve them in any way.  If you 
have call stacks or other profiling information, I would be excited to see what 
it might show.  Find me on IRC to discuss some more today if you're interested.

Thanks,

Howard_______________________________________________
Liblas-devel mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/liblas-devel

Re: [Liblas-devel] Performance hit from 1.21 to 1.6b3

Reply via email to