Re: [Liblas-devel] problems with lasblock and big datasets

Hugo Ledoux Fri, 10 Sep 2010 10:05:58 -0700

thanks everyone for your help. I use a 32bit machine and my swap is the"standard" 2X the memory. I could indeed increase it but then it wouldonly be good until I try a bigger dataset...

I do not need a temporary workaround, I was just testing thecapabilities of lasblock and wanted to understand what you've done, andran into that problem.

Splitting with the scan order is indeed likely to fail, but I'm curiousabout lasindex. What index are you planning to use? And when do youreckon it'll be ready?


Thanks,
Hugo




On 10-09-10 4:12 PM, Howard Butler wrote:


On Sep 10, 2010, at 8:28 AM, Michael Smith wrote:

Howard,

Would it be easy to process in chunks when filesize exceeds memory?  
Essentially do internally what you have shown externally?


las2las2 --split isn't such a desirable approach because it splits apart the 
file in order (usually scan order), which means that the blocks from bigone_1 
and bigone_2 would very likely overlap.

Another potential option that's not quite ready for primetime would be to use 
lasindex to build an index on the file, *use the index for --split instead of 
scan order*, and then do lasblock on those.  But, as I said, not quite ready 
enough for general use.

Here's what Andrew (who wrote the organization algorithm in chipper.cpp that 
lasblock uses) replied to me this morning with:

As far as this problem goes, the easiest thing for most people is to
probably create a swap file to provide sufficient memory.  IMO, the
normal swap file recommendation (2x physical memory) is too small.
I'm not sure the benefit of being short of memory when you have so
much disk and it's so easy to make a sawp file.

It looks like the algorithm needs 48 bytes/point plus additional
minimal overhead.  For 280 million points, this is about 13.5 gig.
Add a 20 gig swap file and you should be fine.  The algorithm IS
limited to 4 billion+ points, as that is the max unsigned int value on
most machines (of course less on a 32 bit machine, as you're going to
run out of address space long before you get there).  I guess the
other limitation is that you need the memory to be available in a
consecutively, but on, say, as 64 bit machine (and OS), I can't
imagine that's a problem.

As far as using less memory in general, yes, it can be done, but of
course there are tradeoffs.  The question is how much less would do?
There are lots of options.  Still, to spend effort to reduce memory
usage and still bump into people's machine limits doesn't seem
fruitful.  Of course, an algorithm could be made that uses disk more
and could be pretty-much guaranteed to work, but it would necassarily
be as slow as the disk access (which might not be that bad if you have
the memory to cache).


Bumping your swap up temporarily does seem like a simple fix.

Is it simply that allocating 2 arrays of 280M is too much and then it aborts?



Yep.  It's trying to reserve 3*280m though, and depending on the underlying stl 
implementation of std::vector::reserve, actually trying to allocate it.


I'm wrong.  It's 2*280m.

Howard

_______________________________________________
Liblas-devel mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/liblas-devel

Re: [Liblas-devel] problems with lasblock and big datasets

Reply via email to