For LAS 1.0-1.2 we will use a calculated point count if the header's value does not match the expected point count *and* the actual point data contains the exact number of bytes required to completely contain points (ie, point_data % point_format == 0). For LAS 1.3 data, we're going to just blindly believe the header, and do no checking. If the modulo function fails, an exception is going to be thrown with some numbers that someone could do some simple math to maybe have a chance at figuring out what's going on.

Hello everyone,

If I may, I think it would be preferable to have lasinfo explicitly enforce the specifications and return descriptive errors for where the data fails; over time, it will be more valuable to the community to have a standard to test by. As others have suggested it may be valuable to have an option to ignore inconsistencies between the header metadata and the data itself so long as the data segment remains identifiable. Putting the above suggestion into say the liblas library code or lasinfo would force the user to understand the very situational nature of what is being checked and when; over time it may be difficult to remember why this code was inserted but removing it will be inherently bad for compatibility.

I would suggest a new utility (as others have also proposed) such as LASRepair or LASValidate that could be a repository for all the situational code that would be needed to bring versions into alignment. As part of the OpenTopography group here at SDSC, I'd be happy to help write this utility if volunteers are needed.

Best Regards,

Charles Cowart
OpenTopography.org

On Nov 4, 2010, at 11:54 AM, [email protected] wrote:

Send Liblas-devel mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.osgeo.org/mailman/listinfo/liblas-devel
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Liblas-devel digest..."


Today's Topics:

  1. Dealing with "bad" data (Howard Butler)
  2. Re: Dealing with "bad" data (Mateusz Loskot)
  3. Re: Dealing with "bad" data (Andrew Bell)
  4. Re: Dealing with "bad" data (Volker Wichmann)
  5. Re: Dealing with "bad" data (Mateusz Loskot)
  6. Re: Dealing with "bad" data (Volker Wichmann)
  7. Re: Dealing with "bad" data (Mike Grant)
  8. Re: LAS 1.3 point support working for upcoming libLAS      1.6
     (Mike Grant)


----------------------------------------------------------------------

Message: 1
Date: Thu, 4 Nov 2010 11:07:45 -0500
From: Howard Butler <[email protected]>
Subject: [Liblas-devel] Dealing with "bad" data
To: "[email protected]" <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii

All,

There are a number of softwares that are quite lax in how they write LAS files. Some of the things I've found softwares doing include:

* miswriting and generally screwing things up in the header, but having a legitimate offset so you could read points
* writing invalid point counts in the header (very common)
* following the extremely broken LAS 1.3 R10 specification that had a 7*long return count in the header instead of the required and expected 5*long

This email asks what should be our default stance should be in the face of bad data. Some things, like an invalid point count, are partially recoverable, but attempts to reconcile many other will often result in proliferating bad data. Should we be hard asses and always throw an error? Do our best to recover on a case-by-case basis?

The most common case of bad data that I've seen is invalid point counts in the header. An accurate point count isn't so important for LAS 1.0-1.2 data because you can provide a calculated point count by measuring the size of the file, removing the header, and dividing that value by the number of bytes each point takes. It is very important for LAS 1.3 data because waveform data can exist after the point data.

In this most common case, I propose the following:

For LAS 1.0-1.2 we will use a calculated point count if the header's value does not match the expected point count *and* the actual point data contains the exact number of bytes required to completely contain points (ie, point_data % point_format == 0). For LAS 1.3 data, we're going to just blindly believe the header, and do no checking. If the modulo function fails, an exception is going to be thrown with some numbers that someone could do some simple math to maybe have a chance at figuring out what's going on.

Sound good?

Howard




------------------------------

Message: 2
Date: Thu, 04 Nov 2010 16:29:51 +0000
From: Mateusz Loskot <[email protected]>
Subject: Re: [Liblas-devel] Dealing with "bad" data
To: Howard Butler <[email protected]>
Cc: "[email protected]" <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 04/11/10 16:07, Howard Butler wrote:
Should we be hard asses and always throw an error?

Providing two modes for processing LAS files: strict and transitional.

Do our best to recover on a case-by-case basis?

Sounds like GDAL's approach to WKT and such.

The most common case of bad data that I've seen is invalid point
counts in the header.  An accurate point count isn't so important for
LAS 1.0-1.2 data because you can provide a calculated point count by
measuring the size of the file, removing the header, and dividing
that value by the number of bytes each point takes.

If number of points in header is invalid
   Read until one of the following is true
      End of file
      Number of consumed points equals number reported by header

In this most common case, I propose the following:

For LAS 1.0-1.2 we will use a calculated point count if the header's
value does not match the expected point count *and* the actual point
data contains the exact number of bytes required to completely
contain points (ie, point_data % point_format == 0).

This kind of implicit fixing of broken data stays in contradiction to
performance requirements.

Could be applied in transitional. In strict mode, just give up.

For LAS 1.3 data, we're going to just blindly believe the header, and
do no checking.  If the modulo function fails, an exception is going
to be thrown with some numbers that someone could do some simple math
to maybe have a chance at figuring out what's going on.

Sound good?

I don't know. Broken standards always suck as standards.

However, see XHTML, it's not die hard always, but allows users to
consciously choose between string and transitional mode, and validate
their data against selected mode.

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net
Charter Member of OSGeo, http://osgeo.org
Member of ACCU, http://accu.org


------------------------------

Message: 3
Date: Thu, 4 Nov 2010 11:43:10 -0500
From: Andrew Bell <[email protected]>
Subject: Re: [Liblas-devel] Dealing with "bad" data
To: Howard Butler <[email protected]>
Cc: "[email protected]" <[email protected]>
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Nov 4, 2010 at 11:07 AM, Howard Butler <[email protected]> wrote:
All,

There are a number of softwares that are quite lax in how they write LAS files. Some of the things I've found softwares doing include:

* miswriting and generally screwing things up in the header, but having a legitimate offset so you could read points
* writing invalid point counts in the header (very common)
* following the extremely broken LAS 1.3 R10 specification that had a 7*long return count in the header instead of the required and expected 5*long

This email asks what should be our default stance should be in the face of bad data. Some things, like an invalid point count, are partially recoverable, but attempts to reconcile many other will often result in proliferating bad data. Should we be hard asses and always throw an error? Do our best to recover on a case-by- case basis?

If you are detecting the problem anyway, why not just throw the
exception and if someone has the energy, write a utility that will try
to coerce the bad data into good data.  That way you don't have to
clutter the code with non-conforming crap but would still have a way
to get to something useful.  This is pretty typical for DBs.

--
Andrew Bell
[email protected]


------------------------------

Message: 4
Date: Thu, 04 Nov 2010 18:06:09 +0100
From: Volker Wichmann <[email protected]>
Subject: Re: [Liblas-devel] Dealing with "bad" data
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Am 04.11.2010 17:43, schrieb Andrew Bell:
On Thu, Nov 4, 2010 at 11:07 AM, Howard Butler<[email protected]> wrote:
All,

There are a number of softwares that are quite lax in how they write LAS files. Some of the things I've found softwares doing include:

* miswriting and generally screwing things up in the header, but having a legitimate offset so you could read points
* writing invalid point counts in the header (very common)
* following the extremely broken LAS 1.3 R10 specification that had a 7*long return count in the header instead of the required and expected 5*long

This email asks what should be our default stance should be in the face of bad data. Some things, like an invalid point count, are partially recoverable, but attempts to reconcile many other will often result in proliferating bad data. Should we be hard asses and always throw an error? Do our best to recover on a case-by- case basis?

If you are detecting the problem anyway, why not just throw the
exception and if someone has the energy, write a utility that will try
to coerce the bad data into good data.  That way you don't have to
clutter the code with non-conforming crap but would still have a way
to get to something useful.  This is pretty typical for DBs.


I'm in favour of this approach too - provide a utility which tries to
fix broken files but always throw an exception in case a file is not
compliant to the specification. This will allow users to have some
control on how to handle such files.

Volker


------------------------------

Message: 5
Date: Thu, 04 Nov 2010 17:22:18 +0000
From: Mateusz Loskot <[email protected]>
Subject: Re: [Liblas-devel] Dealing with "bad" data
To: Volker Wichmann <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 04/11/10 17:06, Volker Wichmann wrote:
provide a utility which tries to fix broken files but always throw an
exception in case a file is not compliant to the specification

For me, the two parts of this stay in contradiction.
Always throwing if file is not compliant, means never accept broken
data. Never accept broken data, implies not try to fix it.

Whatever it is called, strict and transitional mode (enable/disable with
single switch for libLAS utility) or a separate utility trying to
recover whatever is recoverable from broken data...
The job of implementing those utils repairing broken data produced by
companies loosely interpreting ASPRS LAS will be on expanses of libLAS.

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net
Charter Member of OSGeo, http://osgeo.org
Member of ACCU, http://accu.org


------------------------------

Message: 6
Date: Thu, 04 Nov 2010 18:31:44 +0100
From: Volker Wichmann <[email protected]>
Subject: Re: [Liblas-devel] Dealing with "bad" data
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed

Am 04.11.2010 18:22, schrieb Mateusz Loskot:
On 04/11/10 17:06, Volker Wichmann wrote:
provide a utility which tries to fix broken files but always throw an
exception in case a file is not compliant to the specification

For me, the two parts of this stay in contradiction.
Always throwing if file is not compliant, means never accept broken
data.

Yes, this is what I'm in favour.

Never accept broken data, implies not try to fix it.

Yes, I think this is not something which libLAS needs to deal with.


Whatever it is called, strict and transitional mode (enable/disable with
single switch for libLAS utility) or a separate utility trying to
recover whatever is recoverable from broken data...
The job of implementing those utils repairing broken data produced by
companies loosely interpreting ASPRS LAS will be on expanses of libLAS.

I think this is in contradiction - in case you implement some fallback
mechanisms in libLAS you will already provide something like a "tool". I
fully agree that it is impossible to provide a utility which allows to
fix almost all broken files - but I think such a utility could be more
easily extended to catch specific errors than libLAS core. Why not start
with a utility that catches the issues mentioned by Howard?

best regards,
Volker


------------------------------

Message: 7
Date: Thu, 04 Nov 2010 18:25:49 +0000
From: "Mike Grant" <[email protected]>
Subject: Re: [Liblas-devel] Dealing with "bad" data
To: "[email protected]" <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain;       charset="ISO-8859-1"

On 04/11/10 16:07, Howard Butler wrote:
This email asks what should be our default stance should be in the
face of bad data.  Some things, like an invalid point count, are
partially recoverable, but attempts to reconcile many other will
often result in proliferating bad data.  Should we be hard asses and
always throw an error?  Do our best to recover on a case-by-case
basis?

I'd just throw an exception when a LAS file isn't standards compliant
(+1 hard ass).  A separate tool can be written that catches these and
tries to fix up files.  This keeps things clean and gives direct
feedback on naughty LAS files.

If you want to include additional code to handle bad data, it would be
definitely be nice to be have a flag that enables or disables this
behaviour (the strict/loose interpretation Mateusz suggests).

Cheers,

Mike.

--------------------------------------------------------------------------------
Plymouth Marine Laboratory

Registered Office:
Prospect Place
The Hoe
Plymouth  PL1 3DH

Website: www.pml.ac.uk
Registered Charity No. 1091222
PML is a company limited by guarantee
registered in England & Wales
company number 4178503

--------------------------------------------------------------------------------
This e-mail, its content and any file attachments are confidential.

If you have received this e-mail in error please do not copy, disclose it to any third party or use the contents or attachments in any way. Please notify the sender by replying to this e-mail or e- mail [email protected] and then delete the email without making any copies or using it in any other way.

The content of this message may contain personal views which are not the views of Plymouth Marine Laboratory unless specifically stated.

You are reminded that e-mail communications are not secure and may contain viruses. Plymouth Marine Laboratory accepts no liability for any loss or damage which may be caused by viruses.
--------------------------------------------------------------------------------


------------------------------

Message: 8
Date: Thu, 04 Nov 2010 18:54:35 +0000
From: "Mike Grant" <[email protected]>
Subject: Re: [Liblas-devel] LAS 1.3 point support working for upcoming
        libLAS  1.6
To: "Howard Butler" <[email protected]>
Cc: "[email protected]" <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain;       charset="ISO-8859-1"

On 28/07/10 14:19, Mike Grant wrote:
On 27/07/10 20:53, Howard Butler wrote:
Your file is properly broken now :)

Perfect - that gives me more incentive to nag for the new processor
release ;)

It might be worth marking the sample I gave you as broken until we get a
new one, so as not to confuse people.

I'm reminded by the talk of bad files that we got an update to our LAS
1.3 processor :) See if this reprocessed sample blows anything else up..

http://arsf-dan.nerc.ac.uk/files/NERC-ARSF-LAS1_3-sample-release2.tar.bz2

Cheers,

Mike.

--------------------------------------------------------------------------------
Plymouth Marine Laboratory

Registered Office:
Prospect Place
The Hoe
Plymouth  PL1 3DH

Website: www.pml.ac.uk
Registered Charity No. 1091222
PML is a company limited by guarantee
registered in England & Wales
company number 4178503

--------------------------------------------------------------------------------
This e-mail, its content and any file attachments are confidential.

If you have received this e-mail in error please do not copy, disclose it to any third party or use the contents or attachments in any way. Please notify the sender by replying to this e-mail or e- mail [email protected] and then delete the email without making any copies or using it in any other way.

The content of this message may contain personal views which are not the views of Plymouth Marine Laboratory unless specifically stated.

You are reminded that e-mail communications are not secure and may contain viruses. Plymouth Marine Laboratory accepts no liability for any loss or damage which may be caused by viruses.
--------------------------------------------------------------------------------


------------------------------

_______________________________________________
Liblas-devel mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/liblas-devel


End of Liblas-devel Digest, Vol 35, Issue 4
*******************************************

_______________________________________________
Liblas-devel mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/liblas-devel

Reply via email to