Background

I have been looking into using parquet files for storing and working with
automotive data. One interesting thing about automotive data is that most
communication happens on the CAN bus where we have extremely limited bandwidth.
In order to encode "physical" values in a very space efficient way, we
use linear conversion formulas that look like "phys = (raw * factor) + offset".
This gives implicit range and resolution limits, but that is often just fine
when we are representing a physical property.

Example 1:

We have a throttle that can be anywhere from 0-100% and we want to fit that
value into 1 byte. So we would use a formula like:

    phys = (raw * 0.39215) + 0

Example 2:

We want to record ambient temperature of the vehicle. Resolution of 1 degree is
fine. Also, temperatures below -40 and above 215 degrees C are not particularly
useful as they are very rare and out of scope for a useful temperature.

    phys = (raw * 1.0) - 40

So far, I have been converting the raw data into floating point data before
writing to arrow format to make it easier for the analysts to use the
data. This of course means that I am converting to a less efficient format and I
am also losing inherent information about the raw signal. I would rather be able
to store the raw data in an appropriately sized unsigned integer and
automatically convert to floating point when using the data, similar to
dictionary encoding.

Discussion

- How would people generally deal with this situation using the arrow format?
- Is this something that other people are interested in?
- If this were to be added to the spec, what would be the best way to do it?

While I am coming from an automotive perspective, I think there are many other
areas of applicability (reading sensor data through an ADC, industrial
automation and monitoring, etc.)

I could see this working as either a new primitive type (similar to decimal), or
as an extension where we simply put the factor and offset as standard metadata
fields.

Best regards,
Elliot Morrison-Reed

Reply via email to