[jira] [Work logged] (IMAGING-251) Support TIFF standard floating point data

ASF GitHub Bot (Jira) Sun, 10 May 2020 08:09:09 -0700


     [ 
https://issues.apache.org/jira/browse/IMAGING-251?focusedWorklogId=432547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-432547
 ]


ASF GitHub Bot logged work on IMAGING-251:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/May/20 15:08
            Start Date: 10/May/20 15:08
    Worklog Time Spent: 10m 
      Work Description: gwlucastrig commented on a change in pull request #72:
URL: https://github.com/apache/commons-imaging/pull/72#discussion_r422657886



##########
File path: 
src/main/java/org/apache/commons/imaging/formats/tiff/datareaders/ImageDataReader.java
##########
@@ -14,6 +14,96 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+
+ /*
+ * Implementation Notes:
+ *
+ *   Additional implementation notes are given in:
+ *        DataReaderStrips.java
+ *        DataReaderTiled.java
+ *
+ * The TIFF Floating-Point Formats ----------------------------------
+ *    In addition to providing images, TIFF files can supply data in the
+ * form of numerical values. As of March 2020 the Commons Imaging library
+ * was extended to support some floating-point data formats.
+ *    Unfortunately, the floating-point format allows for a lot of different
+ * variations and only the most widely used of these are currently supported.
+ * At the time of implementation, only a small set of data products were
+ * available. Thus it is likely that developers will wish to extend this 
capability
+ * as additional test data become available. When implementing extensions
+ * to this logic, developers are reminder that image processing requires
+ * access to literally millions of pixels, so attention to performance
+ * is essential to a successful implementation (please see the notes in
+ * DataReaderStrips.java for more information).
+ *    The TIFF floating-point implementation is very poorly documented.
+ * So these notes are included to provide clarification on at least
+ * some aspects of the format.
+ *
+ * The Predictor==3 Case
+ *   TIFF specifies an extension for a predictor that is intended to
+ * improve data compression ratios for floating-point values.  This
+ * predictor is specified using the TIFF predictor TAG with a value of 3
+ * (see TIFF Technical Note 3, April 8, 2005).  Consider a 4-byte floating
+ * point value given in IEEE-754 format.  Let f3 be the high-order byte,
+ * with f2 the next highest, followed by f1, and f0 for the
+ * low-order byte.  This designation shoulod not be confused with the
+ * in-memory layout of the bytes (little-endian versus big-endian), but
+ * rather their numerical values. The sign bit and upper 7 bits of the exponent
+ * are given in the high-order byte, followed by the remaining sign bit
+ * and the mantissa in the lower.
+ *   In many real-valued raster data sets, the sign and magnitude (exponent)
+ * of the values changes slowly which the contents of the mantissa vary in
+ * a semi-random manner, with the information entropy tending to increase
+ * in the lowest ordered bytes.  Thus, the high-order bytes have more
+ * redundancy than the low-order bytes and can compress more efficiently.
+ * To exploit this, the TIFF format splits the bytes into groups based on their
+ * order-of-magnitude.  This splitting process takes place on a ROW-BY-ROW
+ * basis (note the emphasis, this point is not clearly documented in the spec).
+ * .  For example, for row length of 3 pixels -- A, B, and C -- the data
+ * for two rows would be given as shown below (again, ignoring endian issues):
+ *   Original:
+ *      A3 A2 A1 A0   B3 B2 B1 B0   C3 C2 C1 C0
+ *      D3 D3 D1 D0   E3 E2 E2 E0   F3 F2 F1 F0
+ *
+ *   Bytes split into groups by order-of-magnitude:
+ *      A3 B3 C3   A2 B2 C2   A1 B1 C1   A0 B0 C0
+ *      D3 E3 F3   D2 E2 F2   D1 E1 F1   D0 E0 F0
+ *
+ * To further improve the compression, the predictor takes the difference of
+ * each subsequent bytes.  Again, the differences (deltas) are computed on
+ * a row-byte-row basis.  For the most part, the differences combine
+ * bytes associated with the same order-of-magnitude, though there is
+ * a special transition at the end of each order-of-magnitude set (shown in
+ * parentheses):
+ *
+ *      A3, B3-A3, C3-B3, (A2-C3), B2-A2, C2-B2, (A1-C2), etc.
+ *      D3, E3-D3, F3-D3, (D2-F3), E3-D2, etc.
+ *
+ * Once the predictor transform is complete, the data is stored using
+ * conventional data compression techniques such as Deflate or LZW.
+ * In practice, floating point data does not compress especially well, but
+ * using the above technique, the TIFF process typically reduces the overall
+ * storage size by 20 to 30 percent (depending on the data).
+ *    The TIFF Technical Note 3 specifies 3 data size formats for
+ * storing floating point values:
+ *     32 bits    IEEE-754 single-precision standard
+ *     16 bits    IEEE-754 half-precision standard
+ *     24 bits    A non-standard representation
+ * At this time, we have not obtained data samples for the smaller
+ * representations used in combination with a predictor.
+ *
+ * Interleaved formats
+ *   TIFF Technical Note 3 also provides sample code for interleaved
+ * data, such as a real-valued vector or a complex pair.  At this time
+ * no samples of interleaved data were available. As a caveat, the 
specification
+ * that the document provides has disadvantages in terms of code complexity
+ * and performance.  Because the interleaved evaluation is embedded inside
+ * the pixel row and column loops, it puts a lot of redundant conditional
+ * evaluations inside the double nested loops. It is recommended that when
+ * interleaved data is implemented, it should get their own block of code
+ * so as not to interfere with the more common non-interleaved floating-point
+ * processing.
+ */

Review comment:
       Just double checking...  Is your preference that this be included in the 
general class Javadoc?  Alternatively, it could go into either the body of the 
unpackFloatingPointSamples as either a standard comment or as Javadoc.  WHich 
do you think is best? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 432547)
    Time Spent: 10.5h  (was: 10h 20m)

> Support TIFF standard floating point data
> -----------------------------------------
>
>                 Key: IMAGING-251
>                 URL: https://issues.apache.org/jira/browse/IMAGING-251
>             Project: Commons Imaging
>          Issue Type: New Feature
>          Components: Format: TIFF
>    Affects Versions: 1.x
>            Reporter: Gary Lucas
>            Priority: Major
>             Fix For: 1.x
>
>         Attachments: ArizonaHillshade.jpg, Imaging252_USGS_n38w077.jpg
>
>          Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Commons Imaging does not support the floating-point format included in the 
> TIFF specification. There are prominent data sources that issue products in 
> this format. The ability to support this information would open up new 
> application areas for Commons Imaging.
> TIFF is often used as a mechanism for distributing data from geophysical 
> applications in the form of GeoTIFF files.  Some of this is not imagery, but 
> data. For example, the US Geological Survey is currently releasing 
> high-resolution elevation data grids for the 3DEP program under the name 
> Cloud-Optimized GeoTIFF (COG). It is a substantial data set with significant 
> potential commercial and academic applications.
> To access this data means modifying the TIFF DataReaderStrips and 
> DataReaderTile classes to recognize floating point data (which is typically 
> indicated using TIFF tag #339, SampleFormat). Also, returning the data in the 
> form of a BufferedImage makes no sense at all, so the API on the 
> TiffImageParser and supporting classes would need additional methods to 
> return arrays of floats.  The good news here is that that requirement would 
> mean adding new methods to the classes rather than making significant changes 
> to existing classes. So the probability of unintended consequences or new 
> bugs in existing code would be minimized.
> Specification details for floating-point are given in the main TIFF-6 
> documentations and Adobe Photoshop TIFF Technical Note 3.
>  
> I am willing to volunteer to make these changes provided that there is 
> interest and a high probability that my contributions would be evaluated and, 
> if suitable, integrated into the Commons Imaging code base. 
> Thank you for your attention in this matter.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (IMAGING-251) Support TIFF standard floating point data

Reply via email to