On Oct 20, 2010, at 7:45 PM, Mateusz Loskot wrote: > On 20/10/10 23:20, Gary Huber wrote: >> Howard, >> >> I've run into a problem trying to read a billion-point LAS file to test >> the index. It looks like I run out of memory in the >> CachedReaderImpl::m_mask while it is adding vector members trying to >> reach the number of points in the file. It's in the loop on line 88 of >> cachedreader.cpp. > > Gary, > > I think I found where is the problem and I hope I've fixed it. > > http://trac.liblas.org/changeset/5a272a57945c3e2383a63b87cd9356f1e402e2f6
Thanks Mateusz! > > Your use allocates mask large for 700 millions bytes. > Now, if my finding is correct and mask was sometimes incorrectly > doubled, it makes 1400 millions. > > Allocation failure does not surprise me here, as it requests > continuous memory block of 1.5 GB for the mask array only. > I'm pretty sure the failure happens due to heap fragmentation. > > Anyway, as I've said, I believe there was a bug which has been fixed. > I'd appreciate if you could rebuild and check (I don't have such huge > LAS file). > >> Is there a way to override point cacheing or will I just hit a similar >> wall somewhere else if I do? > > Practically, reader implementation is pluggable. > Cached reader is used by default, but it is possible to > add LASReader constructor to allow to select strategy in run-time. > Yes, I meant to (re)implement the ReaderFactory to alternate between the CachedReader, ReaderImpl and custom ReaderI* implementations, but I neglected to get around to it. I think this should be revisited before the next beta, as the CachedReader as it is now is not that useful and a potential bottleneck with its default settings. > In the meantime, you can try to replace this line: > > http://trac.liblas.org/browser/src/lasreader.cpp#L66 > > with: > > m_pimpl(new detail::ReaderImpl(ifs)), > It is obvious to me, that for such large datasets, caching is > pointless due to memory constraints - it's not really feasible to find > continuous memory block of 2-3 GB of :-) > > Alternative solution could be to partition cache and instead of 1 array, > manage it with N arrays (all of equal size) and calculate index of mask: > > (index of array * size of array) + index of mask in array Indeed my implementation is quite naive. I would be excited for someone more capable than myself to revisit its silliness :) > > This would allow some degree of random access. > > >> I made the change here and like I thought, much faster and I don't hit >> my memory limit. this would seem to be a way to speed up the reading of >> any large LAS file if you want me to check it in so you can look at it. > > Yes, it's obvious caching makes the engine running slower, but it has > some benefits...you don't have to read records from disk more than once. My intent with the CachedReaderImpl is that it is for applications that expect to have memory-resident access to reasonably-sized (for some value of reasonably-sized :) files. We need to revisit the ReaderFactory and use the base reader by default, and allow applications to use the CachedReaderImpl as they need. Howard _______________________________________________ Liblas-devel mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/liblas-devel
