Hi Clemens,
Here's some file-by-file feedback, answers to questions, etc. I've attached 2
files:
* Deflater.c.reformat is by-and-large the same as the file that you sent,
except that it compiles on Solaris without warnings (it wouldn't compile there
w/o change; perhaps the linux compiler (I'm guessing) you used is more
lenient), and is formatted more in keeping with the rest of the file's style.
* Deflater.c has some further changes on my part, described below. I've run
all our regression tests on this one and it passes. I haven't run JCK tests,
nor our more extensive performance suite.
File-by-file commentary:
*** Deflate.c
Doesn't compile, at least not on Solaris. Several warnings. I fixed them.
See attached Deflater.c.reformat. (I have not tried compiling on other
platforms.)
Some stylistic issues need attention; see e.g. deflateBytes for brace
positioning, trailing whitespace, space between keyword and parens.
Should document fields in def_data (compare with zip_util.h).
Some field IDs no longer necessary, since they're passed in as params. I
removed them.
There's a slight change to the semantics of "finished". Current code sets
Deflater.finished only if setParams is false. Changed code may set it
regardlewss of setParams. I suspect this is OK: if client code changed
strategy or level and called Deflater.deflate(), it would invoke
deflateParams(), and not alter the value of finished. The client's subsequent
call to Deflater.deflate() would call deflate() which AFAICT would cause
finished to be set. What do you think?
Why fall through from Z_OK to Z_BUF_ERROR? In Z_BUF_ERROR and default cases,
why continue execution of loop instead of returning 0 as does original code?
I changed this; see attached Deflater.c
*** Deflater.java: OK
*** DeflaterOutputStream.java: OK
More inline below.
Clemens Eisserer wrote:
Hi Dave,
Thanks a lot for your reply.
To make it short: Of course I understand that this is low-priority
(also for me, its a fun-only fix because someone in forums.java.net
mentioned it) so don't hurry.
Sorry that I wasted your time with my messy files, they were taken
from my "playground" thats why they were in such a bad shape - they
were only intended to give an idea which "road" I was taking. I
attached the new files taken from the mercurial repositories and only
modified at the affected places.
With a change of this sort, we really do need tests along with a fix. Have
you started writing any test cases?
I completly agree - I have some simple test-cases which test more or
less only very basic functionality of Deflater and they work well
(also FlatterTest passes).
I'll write some more tests which test exotic use-cases like changing
compression-level, ... during compression.
Great, thanks. It would be a Good Idea to have a test that checks my
assumption re finished (see above).
I have some open questions:
1.) Is the seperate structure approach to hold the stride-buffers ok?
I think so.
2.) Any suggestions for the following names: 1. strm-field in class
(defAdr), 2. defAdr-parameter,3. defptr - long_to_ptr of defAdr, 4.
def_data - name of the structure
Those don't quite match what I see in the code; but what's in the code seems OK:
def_data for the struct,
def_adr as a param to deflateBytes, etc. etc.
defptr to reference a def_data in init and deflateBytes
3.) I am not really used to program in C. Are the adress-operations ok
which I used to get members of the new struct def_data?
It seems OK.
Thanks for your patience, lg Clemens
Some notes, and changes in ramdom order:
* Changed deflate-bytes to the old behaviour to return after the call
to deflateParams
Good; AFAICT at maintains the existing semantics.
* Verified that its ok to call deflateParams when there's not enough
space in the output-buffer to flush all "old" data out (thanks to Mark
Adler)
* I changed the method-signiture of the native method compared to
original, because some variables were read from JNI-code, whereas they
could have been passed simply down using method parameters. I think
its "cleaner" to pass it.
The long argument to deflateBytes is a bit cumbersome, but Ken Russell opined
that it provides better performance, so it's a Good Thing:
> Kenneth Russell wrote:
> I strongly agree with the contributor's suggestion. Not only is passing
> the argument from Java less code, it is also faster since field access
> from Java can be optimized by the HotSpot compiler, where field accesses
> through the JNI must go through the same set of boilerplate
> C/C++/assembly every time.
* Allocation of the stride-buffers together with the z_stream
structure. z_stream is really large, so the two stride-buffers should
not add that much overhead. However this has the advantage of not
mallocing/freeing and also beeing able to fill the input-stride-buffer
once for several calls of the native method.
Looks good.
* Renamed the strm-adress-parameter to defadr, because it no longer
really points to a strm. I did not rename the java field "strm"
because I did not have an idea for a proper name.
It should have had a different name from day one. I'm slightly loathe to make
a name change, nor do I have (Friday afternoon) a Really Good Idea.
* Removed striding from DeflaterOutputStream, (looked how code looked in 1.4.2).
Looks good.
From your other email:
> I also thought about implementing striding in the CRC32/Adler32
> classes which basically suffer from the same block-the-gc behaviour as
> Inflater/Deflater did before they were "fixed" ;)
I suspect these are not as much of an issue: presumably CRC32 and adler32
calculations are faster than deflation (caveat: I haven't measured them).
Other have pointed out / shamed us because, hey, shouldn't these be in Java
anyway?
> Furthermore do you have good ideas for regression tests?
> The usual compression/decompression works fine, can you imagine
> corner-cases which would be worth special testing?
> Should the tests written in the jtreg format?
I'd like to see tests where there's a possibility of semantics having changed;
as noted above re "finished".
I haven't done a performance analysis, but don't expect a regression. If
anything, since the striding is kept within native code, there should be fewer
Java -> native calls, and better performance (though that is perhaps not
measurable).
My last comment is about this change in general. It seems like a reasonable
fix, though the corresponding bug:
http://bugs.sun.com/view_bug.do?bug_id=6399199
is a low priority one for us, and this is code we generally feel is best left
alone when possible. We are still learning how best to work with
contributions from outside of Sun. I will check with other who've maintained
this code in the past, to get their opinion of making this kind of change.
While I'm currently responsible for jar/zip code, it's only one of the hats I
currently wear ;-)
Thanks,
Dave