On Thu, 8 Apr 2021 08:54:06 GMT, Alan Bateman <al...@openjdk.org> wrote:

>> Dear All,
>> May I ask your help to review this change? Thanks!
>> 
>> BRs,
>> Lin
>
>> Dear All,
>> May I ask your help to review this change? Thanks!
> 
> @LanceAndersen Do you have cycles to help Lin? This proposal will require 
> discussion, they may be case for the header to be a record for example. My 
> personal view is that the PR should be set aside until there is at least at 
> least some agreement on the API.

Dear @AlanBateman  and @LanceAndersen,

Thanks a lot for your review and comments!

> 
> We should look to see if it makes sense to use some of the more recent java 
> features such as Record.
> 
> If we are adding a specific class to hold the header fields, perhaps using 
> the builder pattern should be considered vs constructors.

I agree, we should finalize the API before moving forward. I am not quite 
fimilar with Record, I will do some investigation, trying to use it in this PR 
and discuss with you later. 

> If we are writing out these additional fields, what should we do with them 
> when reading a gzip file back?

IMO, generally there should be check of header crc, and some other checks like 
the header format, magic number etc. May be we could implement it like the gnu 
gzip tool, which ingore contents of most header fields but verified the general 
header info.

> Have you looked at other gzip api implementations to see what they provide in 
> this area?

I have looked at the gzip implementation at 
https://github.com/linzang/gzip/blob/distrotech-gzip/gzip.c#L1281, this method 
is use to generate header crc value for check. And there is some legal header 
check in this method.  And `file name` is used to save original file name in 
gzip when it is truncated. refer to 
https://github.com/linzang/gzip/blob/distrotech-gzip/zip.c#L37

In JDK, there is a usage of `file comment` in the gzip heap dump file generated 
by jcmd `jmap -dump` command.  at 
https://github.com/openjdk/jdk/blob/ff52f2989fd60ec8251eaf76f4c4b78f10d3e048/src/hotspot/share/services/heapDumperCompression.cpp#L127
 , and this info is used in testing like an hint for uncompression, please 
refer 
https://github.com/openjdk/jdk/blob/ff52f2989fd60ec8251eaf76f4c4b78f10d3e048/test/lib/jdk/test/lib/hprof/parser/GzipRandomAccess.java#L280.

And IMHO the behavior of adding information in `file comments` seems like  a 
general way to transfer extra information between compressor and uncompressor. 


> Is there anyone who relies on this additional header info? As I mentioned in 
> an earlier comment, we should validate the changes against another 
> implementation to ensure that we can read the data back correctly and that 
> the additional header data that we write can be read by other tools.

May be we could have similar behavior? which just do some general header info 
check, calculate CRC of the header and ignore the real contents of most header 
fields. Do you think it is reasonable? 


Thanks!
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/3072

Reply via email to