Re: [math] "equals" in "Vector3D"

2009-04-29 Thread Wolfgang Glas
Gilles Sadowski schrieb:
> Hi.
>  
>> There is a very good article on ieee754 equality under
>>
>>   http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
>>
>> which contains an excessive discussion about various "equality" approaches.
>>
>> IMHO it should be moreover considered to include the 
>> "AlomostEquals2sComplement"
>> method of this paper into commons-math if not done so far.
> 
> It would be a nice addition to "util.MathUtils".
> I'll open another JIRA issue.

Could you please CC me to this issue. I'm surrentyl not involved in
commons-math, but I'm going to use it in a few weeks. Maybe I can contribute
code and/or testcases to this issue ;-)

  Regards,

Wolfgang

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] "equals" in "Vector3D"

2009-04-29 Thread Wolfgang Glas
Gilles Sadowski schrieb:
>>> L1 norm: equals(a, b, tolerance) = sum(abs(a-b)) < tolerance
>>> L2 norm: equals(a, b, tolerance) = sqrt(sum((a-b)^2)) < tolerance
>>> L-infinity norm: equals(a, b, tolerance) = max(abs(a-b)) < tolerance
>>>
>>> All are useful.
>> I'll guess for 3D vector the L2-norm is the more useful as it is
>> invariant when physical orthonormal frames are changed.
>> I'll check in the variant you prefer.
> 
> [Cf. my other reply.]
> Should I still create an issue?

I'm currently not invovled in commons-math, but I've written substatntial amount
of numeric software.

My thought on this point is, that it is generally more feasible to implement a
bunch on distance methods like

  a.distance1(b)   ... L1-Norm
  a.distance(b)... L1-Norm (this is usually what the user wants as default)
  a.distanceInf(b) ... L∞-Norm

and have the user decide on what is equal in his particular situation.


There is a very good article on ieee754 equality under

  http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

which contains an excessive discussion about various "equality" approaches.

IMHO it should be moreover considered to include the "AlomostEquals2sComplement"
method of this paper into commons-math if not done so far.

  Best regards,

   Wolfgang

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] [PATCH] Refactoring of zip encoding support.

2009-03-04 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-03-04, Stefan Bodewig  wrote:
> 
>> On 2009-03-03, Wolfgang Glas  wrote:
> 
>>> The implementation should be be straightforward, shall I prepare a
>>> patch or can you afford doing it at your own?
> 
>> Will do it myself.
> 
> svn revisions 749906 and 749907

Hello Stefan reviewed you code and found out, that you did not strictly use the
same encoding for filenames and comments in one entry.

A patch, which corrects this behaviour is attached.

  Regards,

Wolfgang
Index: src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.java
===
--- src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.java	(Revision 750123)
+++ src/main/java/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.java	(Arbeitskopie)
@@ -629,12 +629,16 @@
 protected void writeLocalFileHeader(ZipArchiveEntry ze) throws IOException {
 
 boolean encodable = zipEncoding.canEncode(ze.getName());
-ByteBuffer name;
+
+final ZipEncoding entryEncoding;
+
 if (!encodable && fallbackToUTF8) {
-name = ZipEncodingHelper.UTF8_ZIP_ENCODING.encode(ze.getName());
+entryEncoding = ZipEncodingHelper.UTF8_ZIP_ENCODING;
 } else {
-name = zipEncoding.encode(ze.getName());
+entryEncoding = zipEncoding;
 }
+
+ByteBuffer name = entryEncoding.encode(ze.getName());
 
 if (createUnicodeExtraFields != UnicodeExtraFieldPolicy.NEVER) {
 
@@ -653,7 +657,7 @@
 
 if (createUnicodeExtraFields == UnicodeExtraFieldPolicy.ALWAYS
 || !commentEncodable) {
-ByteBuffer commentB = this.zipEncoding.encode(comm);
+ByteBuffer commentB = entryEncoding.encode(comm);
 ze.addExtraField(new UnicodeCommentExtraField(comm,
   commentB.array(),
   commentB.arrayOffset(),
@@ -779,12 +783,16 @@
 // CheckStyle:MagicNumber ON
 
 // file name length
-ByteBuffer name;
+final ZipEncoding entryEncoding;
+
 if (!encodable && fallbackToUTF8) {
-name = ZipEncodingHelper.UTF8_ZIP_ENCODING.encode(ze.getName());
+entryEncoding = ZipEncodingHelper.UTF8_ZIP_ENCODING;
 } else {
-name = zipEncoding.encode(ze.getName());
+entryEncoding = zipEncoding;
 }
+
+ByteBuffer name = entryEncoding.encode(ze.getName());
+
 writeOut(ZipShort.getBytes(name.limit()));
 written += SHORT;
 
@@ -798,12 +806,9 @@
 if (comm == null) {
 comm = "";
 }
-ByteBuffer commentB;
-if (!encodable && fallbackToUTF8) {
-commentB = ZipEncodingHelper.UTF8_ZIP_ENCODING.encode(comm);
-} else {
-commentB = zipEncoding.encode(comm);
-}
+
+ByteBuffer commentB = entryEncoding.encode(comm);
+
 writeOut(ZipShort.getBytes(commentB.limit()));
 written += SHORT;
 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [compress] [PATCH] Refactoring of zip encoding support.

2009-03-04 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-03-04, Stefan Bodewig  wrote:
> 
>> On 2009-03-03, Wolfgang Glas  wrote:
> 
>>> The implementation should be be straightforward, shall I prepare a
>>> patch or can you afford doing it at your own?
> 
>> Will do it myself.
> 
> svn revisions 749906 and 749907

ThX very much, Stefan ;-)

I will have a look at the new version and give you feedback in the evening.

  Wolfgang


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] [PATCH] Refactoring of zip encoding support.

2009-03-03 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-03-03, Wolfgang Glas  wrote:
> 
>> Stefan Bodewig schrieb:
>>> On 2009-03-02, Wolfgang Glas  wrote:
> 
>>>> Stefan Bodewig schrieb:
>>>>> On 2009-03-01, Wolfgang Glas  wrote:
> 
>>>>>> 1) Unicode extra fields are written for all ZIP entries and not only
>>>>>> for entries, which are not encodable by the encoding set to
>>>>>> ZipArchiveOutputStream.
> 
>>>>> Maybe room for yet another flag?  Or an enum-like option
> 
>>>>> setCreateUnicodeExtraFields(NEVER | ALWAYS | NOT_ENCODABLE)
> 
>>> Consider the WinZIP case, WinZIP wouldn't recognize the EFS.  If you
>>> set the encoding to UTF-8 and use your code and only add extra fields
>>> for non-encodable paths, WinZIP will never see the correct path.
> 
>> Acccording to my tests WinZip recognizes the EFS flag upon
>> reading.
> 
> Then my documenation is wrong 8-)

Sorry for not exactly reading the Documentation, but I got stuck because the EFS
flag seemed to be not enough for me and I wanted to get this straight before.
But I think we've come a long way and the end i near ;-)

>> Secondly, if you set the encoding to UTF-8, there's no need for
>> unicode extra fields anyway.
> 
> Except when your client doesnt recognize the EFS flag and thinks you'd
> be using CP437 - but happily accepts the Unicode extra fields.  I
> thought this would be the case for WinZIP.

Yes, the EFS flag is of little usefulness. It has been added very late to Specs
and most implementation ignore it right away. Hence thes introduced extra fields
and now we have to live with both 8-)

>>> but looking at the names we may be better off with two independent
>>> options.  Hmm, yes, right now I prefer two flags because they seem to
>>> be orthogonal.
> 
>> I think you should choose, which approach better fits your needs in
>> ant ;-) At least you have to write an XML parser for these settings
> 
> You vastly overestimate the effort it takes to write an Ant task.
> 
> http://svn.apache.org/viewvc/ant/core/trunk/src/main/org/apache/tools/ant/taskdefs/Zip.java?r1=738330&r2=748593
> 
> is all I had to do for the two existing options.

That nice, however, I hope that I can avoid adding ant to the list of OS-project
I participate in ;-)

>> and the documentation, so you might choose the approach which may be
>> explained in brief words.
> 
>> I can live very well with two options ;-)
> 
> If you throw in "fallbacks" we are actually facing three concepts.
> 
> OK, this is what I feel makes most sense:
> 
> createUnicodeExtraFields = NEVER (default) | ALWAYS | NOT_ENCODABLE
> useLanguageEncodingFlag = true (default) | false
> fallbackToUtf8 = true | false

Agreed ;-)

> I'm not sure about the default for the later, probably
> default fallbackToUtf8 = (createUnicodeExtraFields == NEVER)

The default for the later should be false, it is a special option for people who
now, what they are doing.

The implementation should be be straightforward, shall I prepare a patch or can
you afford doing it at your own?

  Regards,

Wolfgang


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] [PATCH] Refactoring of zip encoding support.

2009-03-03 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-03-02, Wolfgang Glas  wrote:
> 
>> Stefan Bodewig schrieb:
>>> On 2009-03-01, Wolfgang Glas  wrote:
> 
>>>> 1) Unicode extra fields are written for all ZIP entries and not only
>>>> for entries, which are not encodable by the encoding set to
>>>> ZipArchiveOutputStream.
> 
>>> Maybe room for yet another flag?  Or an enum-like option
> 
>>> setCreateUnicodeExtraFields(NEVER | ALWAYS | NOT_ENCODABLE)
> 
> Consider the WinZIP case, WinZIP wouldn't recognize the EFS.  If you
> set the encoding to UTF-8 and use your code and only add extra fields
> for non-encodable paths, WinZIP will never see the correct path.

Acccording to my tests WinZip recognizes the EFS flag upon reading. Upon writing
WinZip uses extra fields and encodes filenames as Cp437, which is really the
most useful variant these days.

Secondly, if you set the encoding to UTF-8, there's no need for unicode extra
fields anyway. But as mentioned above, the most portable tool-readable variant
as requested by the reporter of the original SANDBOX-176 issue is writing Cp437
and adding unicode extra fields. EFS support in the wild is not really
widespread, propably due to a mid-air collision between specification writing
and omplementation of widespread ZIP-Implementations

>> I like the idea of a unicode policy flag ;-)
> 
> May be a better approach, agreed.  But only if we manage to cover all
> border cases.
> 
>> My suggestion is
> 
>> setUnicodePolicy(
>>   SURROGATES   | /* no extra fields, no utf-8 fallback, only %U 
>> surrogates*/
>>   EXTRA_FIELDS | /* extra fields for unencodable entriey, no utf-8 fallback  
>>  */
>>   EXTRA_FIELDS_ALWAYS | /* extra fields for all entries, no utf-8 fallback   
>>  */
>>   UTF8_FALLBACK| /* fall back to utf-8 plus EFS flag for unencodable 
>> entries. */
>>   UTF8_FALLBACK_EXTRA_FIELDS| /* fall back to utf-8 plus EFS flag plus extra
>>  fields for unencodable */
>>   UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS /* fall back to utf-8 plus EFS flag for
>>unencodable entries, exta fields for 
>> all
>>entries. */
>> )
> 
>> We might drop the last two options and we might choose a better
>> wording, however the direction should IMHO be as above mentioned...
> 
> This covers all permutations, agreed.
> 
> Names, names, I'm really bad at them.
> 
> EXTRA_FIELDS  => ADD_EXTRA_FIELDS_FOR_UNENCODABLE
> EXTRA_FIELDS_ALWAYS   => ADD_EXTRA_FIELDS
> UTF8_FALLBACK => FALL_BACK_TO_UTF8
> UTF8_FALLBACK_EXTRA_FIELDS=> FALL_BACK_TO_UTF8_PLUS_EXTRA_FIELD
> UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS => FALL_BACK_TO_UTF8_ADD_EXTRA_FIELDS
> 
> but looking at the names we may be better off with two independent
> options.  Hmm, yes, right now I prefer two flags because they seem to
> be orthogonal.

I think you should choose, which approach better fits your needs in ant ;-) At
least you have to write an XML parser for these settings and the documentation,
so you might choose the approach which may be explained in brief words.

I can live very well with two options ;-)

  Wolfgang

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] [PATCH] Refactoring of zip encoding support.

2009-03-02 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-03-01, Wolfgang Glas  wrote:
> 
>> 1) Unicode extra fields are written for all ZIP entries and not only
>> for entries, which are not encodable by the encoding set to
>> ZipArchiveOutputStream.
> 
> Maybe room for yet another flag?  Or an enum-like option
> 
> setCreateUnicodeExtraFields(NEVER | ALWAYS | NOT_ENCODABLE)

I like the idea of a unicode policy flag ;-)

My suggestion is

setUnicodePolicy(
  SURROGATES   | /* no extra fields, no utf-8 fallback, only %U surrogates*/
  EXTRA_FIELDS | /* extra fields for unencodable entriey, no utf-8 fallback   */
  EXTRA_FIELDS_ALWAYS | /* extra fields for all entries, no utf-8 fallback*/
  UTF8_FALLBACK| /* fall back to utf-8 plus EFS flag for unencodable entries. */
  UTF8_FALLBACK_EXTRA_FIELDS| /* fall back to utf-8 plus EFS flag plus extra
 fields for unencodable */
  UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS /* fall back to utf-8 plus EFS flag for
   unencodable entries, exta fields for all
   entries. */
)

We might drop the last two options and we might choose a better wording, however
the direction should IMHO be as above mentioned...

  Regards,

Wolfgang


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] [PATCH] Refactoring of zip encoding support.

2009-03-02 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-03-02, Stefan Bodewig  wrote:
> 
>> some cosmetics and commented out the "only create Unicode field for
>> non-encodable paths" part - svn revision 749342.
> 
> and 749344 - you misspet encoding in Simple8BitEncoding.java and I
> didn't see it in time.

...another small patch with even more javadoc typos and superfluent imports
fixed is attached. TIA for committing,

  Wolfgang
Index: src/main/java/org/apache/commons/compress/archivers/zip/ZipEncoding.java
===
--- src/main/java/org/apache/commons/compress/archivers/zip/ZipEncoding.java	(Revision 749398)
+++ src/main/java/org/apache/commons/compress/archivers/zip/ZipEncoding.java	(Arbeitskopie)
@@ -21,7 +21,6 @@
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
-import java.nio.charset.Charset;
 
 /**
  * An interface for encoders that do a pretty encoding of ZIP
@@ -35,12 +34,12 @@
  * The main reason for defining an own encoding layer comes from
  * the problems with {...@link java.lang.String#getBytes(String)
  * String.getBytes}, which encodes unknown characters as ASCII
- * quotation marks ('?'), which is per definition an invalid filename
- * character under some operating systems (Windows, e.g.) leading to
- * ignored ZIP entries.
+ * quotation marks ('?'). Quotation marks are per definition an
+ * invalid filename on some operating systems  like Windows, which
+ * leads to ignored ZIP entries.
  * 
  * All implementations should implement this interface in a
- * reentrant way.<(p>
+ * reentrant way.
  */
 interface ZipEncoding {
 /**
Index: src/main/java/org/apache/commons/compress/archivers/zip/Simple8BitZipEncoding.java
===
--- src/main/java/org/apache/commons/compress/archivers/zip/Simple8BitZipEncoding.java	(Revision 749398)
+++ src/main/java/org/apache/commons/compress/archivers/zip/Simple8BitZipEncoding.java	(Arbeitskopie)
@@ -21,7 +21,6 @@
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
-import java.nio.charset.Charset;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
Index: src/main/java/org/apache/commons/compress/archivers/zip/NioZipEncoding.java
===
--- src/main/java/org/apache/commons/compress/archivers/zip/NioZipEncoding.java	(Revision 749398)
+++ src/main/java/org/apache/commons/compress/archivers/zip/NioZipEncoding.java	(Arbeitskopie)
@@ -32,7 +32,7 @@
  * java.nio.charset.Charset Charset} to encode names.
  *
  * This implementation works for all cases under java-1.5 or
- * later. However, in java-1.4, some charsets don't have a java-nio
+ * later. However, in java-1.4, some charsets don't have a java.nio
  * implementation, most notably the default ZIP encoding Cp437.
  * 
  * The methods of this class are reentrant.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [compress] [PATCH] Refactoring of zip encoding support.

2009-03-02 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-03-02, Stefan Bodewig  wrote:
> 
>> some cosmetics and commented out the "only create Unicode field for
>> non-encodable paths" part - svn revision 749342.
> 
> and 749344 - you misspet encoding in Simple8BitEncoding.java and I
> didn't see it in time.

Yes, ThX, that's why we have code reviewers ;-)

Hopefully we will get the flaggery in ZipArchiveOutputStream staight during the
next days ;-)

Implementation of all kinds of policies should be quite easy, now my ZipEncoding
engine is committed.

  Regards, Wolfgang


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] State of encoding support in ZIP package

2009-03-02 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-03-01, Wolfgang Glas  wrote:
> 
>> My understanding from previous discussion was, that we need a mode,
>> where file names not encodable by the chosen encoding are encoded in
>> UTF-8, which is in turn indicated by setting the EFS flag on the
>> likewise ZIP entry. (That's the way 7-zip handles unicode
>> filenames...)
> 
> This is different from what we've currently implemented, but may stiil
> be useful.
> 
>> The current implementation of the useEFS flag simply allocs to
>> disable the creation of the UFS flag in ZIP entries, which are
>> UTF-8. This approach is not conformant with the specifiations I've
>> read and I have not seen a single zip implementation, which is
>> disturbed by the EFS flag.
> 
> But if there should be one - say zlib on z/OS or some other strange
> thing, it will be good to have that option available,

OK, agreed, let's keep this flag ;-)

>> My opinion would be to simply drop the possibility to inhibit the
>> EFS flag in utf-8 encoded files and to introduce a new flag allowing
>> to switch to utf-8 fallbacks (7-zip mode...).
> 
> I'm fine with an additional flag that would encode not-encodable file
> names as UTF-8 (not sure about the name of the flag and I have a long
> standing history for chosing bad names), but prefer to keep the
> existing option for the completely orthogonal case of whether we set
> the EFS at all.

OK, I will introduce an additional flag, let's call it
'setFallbackToUtf8(boolean)'. I will prepare a patch right after you've review
and (possibly) committed my latest encoding refatoring patch.

  Best regards,

Wolfgang


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] State of encoding support in ZIP package

2009-03-01 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-02-27, Wolfgang Glas  wrote:
> 
>> Additionally, my experience with WinZip shows, that WinZip writes weird
>> filenames to the single-byte version of the filename when a unicode field is
>> present.
> 
> Hmm, native encoding I'd guess.

Sth like this, looks like they are writing the LSB of a 2-byte value...

> Wolfgang, could you do me a favor and please review what I've written
> for the Ant zip task manual page in svn revision 748593
> <http://svn.apache.org/viewvc?view=rev&revision=748593>, in particular
> <http://svn.apache.org/viewvc/ant/core/trunk/docs/manual/CoreTasks/zip.html?r1=748593&r2=748592&pathrev=748593>?

Seems quite OK ;-)

The one thing, I'd like to discuss is the semantics of the useEFS flag in
ZipArchiveOutputStream:

My understanding from previous discussion was, that we need a mode, where file
names not encodable by the chosen encoding are encoded in UTF-8, which is in
turn indicated by setting the EFS flag on the likewise ZIP entry. (That's the
way 7-zip handles unicode filenames...)

The current implementation of the useEFS flag simply allocs to disable the
creation of the UFS flag in ZIP entries, which are UTF-8. This approach is not
conformant with the specifiations I've read and I have not seen a single zip
implementation, which is disturbed by the EFS flag.

My opinion would be to simply drop the possibility to inhibit the EFS flag in
utf-8 encoded files and to introduce a new flag allowing to switch to utf-8
fallbacks (7-zip mode...).

What other opinion are out there?

  Wolfgang

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[compress] [PATCH] Refactoring of zip encoding support.

2009-03-01 Thread Wolfgang Glas
Hello all,

  Well, the latest discussions with Stefan showed two shortcoming of our current
ZIP unicode support:

1) Unicode extra fields are written for all ZIP entries and not only for
entries, which are not encodable by the encoding set to ZipArchiveOutputStream.

2) In order to implement selective writing of unicode specials, one needs a
robust implementation of wether a name can be encoded or not. This is
exspecially inhibtied by the fact, taht Cp437 has been omitted from java-1.4's
java.nio.Charset.

To overcome these shortcoming, I had to introduce a ZipEncoding interface plus a
java.nio implementation and a handcrafted implementation for Cp437 (and cp850)
and refactor all the encoding stuff.

The patch is attached. The new code is IMHO really better to read and make all
cp437-related stuff accessible on java-1.4 as well. The benfit from all these
efforts in the end is, that the cp437 test case now runs flawlessly under 
java-1.4

  Stefan, might you please review the patch and eventually apply this one?

   TIA,

Wolfgang


commons-compress-encoding-refactoring-svn749109.patch.gz
Description: GNU Zip compressed data
-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [compress] State of encoding support in ZIP package

2009-02-27 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> On 2009-02-26, Wolfgang Glas  wrote:
> 
>> Stefan Bodewig schrieb:
> 
>>> The question on defaults: should ZipFile look for UnicodeExtraFields
>>> by default or ignore them (as it does right now)?
> 
>> I'd do this by default, because IMHO we should have a 'smart' unzipper in
>> commons-compress ;-)
> 
> Convinced.  svn revision 748556.

That's nice ;-)

Additionally, my experience with WinZip shows, that WinZip writes weird
filenames to the single-byte version of the filename when a unicode field is
present.

  Wolfgang

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] State of encoding support in ZIP package

2009-02-26 Thread Wolfgang Glas
Hi Stefan,

  Thanks for your tremendous work on finishing ZIP encoding support ;-)

Stefan Bodewig schrieb:
> Hi all,
> 
> a quick update and a question on defaults:

[snip]

> * documentation (will tackle that next)

May I help you at this point ?

> * ZipArchiveInputStream - SANDBOX-293

Well, ZipArchiveINputStream rermains a problematic issue. WinZip
interoperability may not be achieved using the local file header (see
SANDBOX-292), however java.util's implementation has no encoding format and no
EFS interpretation, which is a large improvment on its own.

> * WinZIP interop - SANDBOX-292

Well I think we should tackle this one for ZipFile. Since ZipFile has a random
access file at it's hands, interpretation of central directory information is a
clear advantage of ZipFile in favour of ZipInputStream.

> The question on defaults: should ZipFile look for UnicodeExtraFields
> by default or ignore them (as it does right now)?

I'd do this by default, because IMHO we should have a 'smart' unzipper in
commons-compress ;-)

If you need help for further work on the above mentioned issues, please write 
me.

  Regards,

 Wolfgang


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] ZIP - encoding of file names - again

2009-02-18 Thread Wolfgang Glas
Stefan Bodewig schrieb:
> I started to take some baby steps implementing it, in particular
> 
> On 2009-02-13, Stefan Bodewig  wrote:
> 
>> Currently I think the best default approach would be to use UTF-8 as
>> the default encoding and set the EFS bit since this will create
>> archives compatible with java.util.zip but has the additional benefit
>> of clearly stating it is using UTF-8.
> 
> UTF-8 is now the default for ZipArchiveOutputStream and ZipFile, EFS
> support is not yet in.
> 
> The InfoZIP extra fields are supported, but one has to write them
> manually right now.  They should be read transparently by ZipFile but
> don't affect the file name or comment ATM.
> 
> Wolfgang, you may notice a few minor tweaks to your original code.  Do
> you happen to have stand-alone tests for the Unicode extra fields
> anywhere?

A rudimentary test is in my original patch as attached to SANDBOX-176.
I have refactored this test to the current SVN revision an attached to this
mail. The test needs either be refactored to use ZipFile or
ZipArchiveInputStream has to be implemented ;-)

  I also had to expose the ZipEncodingHelper functionality to the public in
order to compile the new test.

  You also need the two zip files attached to SANBOX-176 in src/test/resources
in order to run the interoperability test.

> I took the liberty to apply the same patches to Ant trunk as well.

Nice to see ;-)

  Best regards,

 Wolfgang
/*
 *  Licensed to the Apache Software Foundation (ASF) under one or more
 *  contributor license agreements.  See the NOTICE file distributed with
 *  this work for additional information regarding copyright ownership.
 *  The ASF licenses this file to You under the Apache License, Version 2.0
 *  (the "License"); you may not use this file except in compliance with
 *  the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 *
 */

package org.apache.commons.compress.archivers;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.UnsupportedEncodingException;

import org.apache.commons.compress.AbstractTestCase;
import org.apache.commons.compress.archivers.zip.UnicodePathExtraField;
import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipArchiveInputStream;
import org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream;
import org.apache.commons.compress.archivers.zip.ZipEncodingHelper;
import org.apache.commons.compress.archivers.zip.ZipExtraField;

public class TestUtf8ZipFiles extends AbstractTestCase
{

private static final String UTF_8 = "utf-8";
private static final String CP437 = "cp437";
private static final String US_ASCII = "US-ASCII";
private static final String ASCII_TXT = "ascii.txt";
private static final String EURO_FOR_DOLLAR_TXT = "\u20AC_for_Dollar.txt";
private static final String OIL_BARREL_TXT = "\u00D6lf\u00E4sser.txt";

private void createTestFile(File file, String encoding) throws UnsupportedEncodingException, IOException {

ZipArchiveOutputStream zos = new ZipArchiveOutputStream(file);

zos.setEncoding(encoding);

ZipArchiveEntry ze = new ZipArchiveEntry(OIL_BARREL_TXT);

if (!ZipEncodingHelper.canEncodeName(ze.getName(),zos.getEncoding()))
ze.addExtraField(new UnicodePathExtraField(ze.getName(),zos.getEncoding()));

zos.putNextEntry(ze);
zos.write("Hello, world!".getBytes("US-ASCII"));
zos.closeEntry();

ze = new ZipArchiveEntry(EURO_FOR_DOLLAR_TXT);
if (!ZipEncodingHelper.canEncodeName(ze.getName(),zos.getEncoding()))
ze.addExtraField(new UnicodePathExtraField(ze.getName(),zos.getEncoding()));

zos.putNextEntry(ze);
zos.write("Give me your money!".getBytes("US-ASCII"));
zos.closeEntry();

ze = new ZipArchiveEntry(ASCII_TXT);

if (!ZipEncodingHelper.canEncodeName(ze.getName(),zos.getEncoding()))
ze.addExtraField(new UnicodePathExtraField(ze.getName(),zos.getEncoding()));

zos.putNextEntry(ze);
zos.write("ascii".getBytes("US-ASCII"));
zos.closeEntry();

zos.close();
}
  
private UnicodePathExtraField findUniCodePath(ZipArchiveEntry ze) {

ZipExtraField[] efs = ze.getExtraFields();

for (int i=0;i-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons

Re: [compress] ZIP - encoding of file names - again

2009-02-13 Thread Wolfgang Glas
Hi Stefan,

  My comments follow.

Stefan Bodewig schrieb:
> Let me try to capture the various threads in SANDBOX-176 and from this
> list into something we can draw conclusions from.
> 
> First some background:
> ==

[snip]

> Reading
> ===
> 
> Let's keep ZipArchiveInputStream out of the discussion for now 8-)

Yes, we should do so. I analysed my winzip example and recognized, that unicode
extra fields are written to the central directory records and not to the local
file headers. This makes it impossible to get the real Unicode filename when
parsing a ZIP file in the way as all ZipInputStream implementations I've seen
do. (They sequentially parse the local file headers and ignore the central
directory records...)

Furthermore, relicensing of any GPL-version of java.util.zip.ZipInputStream
version seems to be impossible, because of the large number of contributors to
the code out there. (I've tried to find the contributors to GNU classpath'
version, there's nearly no possiblity to find them all...)

> I propose to change ZipFile to support both the EFS flag as well as
> the InfoZIP extra fields when reading archives.

That's a good choice. I'Ve already provided the parsing code for unicode extrra
fields, so the implementation should be quite easy ;-)

> I'm not sure what ZipFile should do if it encounters both the EFS flag
> and the extra fields.  Likely it is best to assume both hold the same
> information and simply use the EFS encoded name.

Agreed.

> The question is what ZipFile should assume as its default if neither
> the EFS nor extra fields are present.  This can be controlled by
> "setEncoding" right now and defaults to the platform's default
> encoding but a default of UTF-8 (compatible with java.util.zip) or
> CodePage 437 (compatible with formal ZIP spec) are valid choices as
> well.

AFAIKS, ant API user are used to the 'setEncoding(String encoding)' approach
although it yould be better to rename the method to 'setDefaultEncoding(String
encoding)'.

> Writing
> ===
> 
> I propose new flags get/setLanguageEncodingFlag for EFS and
> get/setAddUnicodeExtraFields on ZipArchiveOutputStream that control
> whether either approach is used.  I.e. I propose to optionally support
> either approach (and both at the same time).

The question at this point is, whther to us the EFS flag for *all* records* or
only for records not encodable by the encoding set by 'setEncoding(String)'.

IMHO we should tke over the 7-zip approach and set the EFS flag only for
not-encodable records, since this approch is mininimally invasive.

Surely the EFS flag should be set for all records, if the encoding is set to 
utf-8.

> IMHO the main question is what the code should do by default.
> 
> Currently I think the best default approach would be to use UTF-8 as
> the default encoding and set the EFS bit since this will create
> archives compatible with java.util.zip but has the additional benefit
> of clearly stating it is using UTF-8.

Yes, this seems to be reasonable, because users will expect JAVA-compatibility
in the first instance.

> Note that using the EFS bit may make the archive unreadable for old
> archivers, that's why we need the option to turn it off.

I've not seen an old archiver you refused to unpack such a file. The only
problem is, that the file names of the unpacked files are wrong. (utf-8
interpreted as CP437, the good news is: All codepoints from 0x80-0xff in CP437
are allocated) However, that's the same problem as arises when unpacking a file
created by java.util.zip.ZipOutputStream.

> I wouldn't add the InfoZIP extra fields by default since they increase
> the archve size.

Yes, that' good so.

How about my suggestion for a 'tuning' method, sets up the ZipOutputStream in a
way, that's suitable for most unzip tools out in the wild?

Or sould we gather all the knowledge we gathered in SANDBOX-176 an in this
thread into the JavaDoc of the class ?

  Regards,

Wolfgang


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [compress] JarMarker

2009-02-13 Thread Wolfgang Glas
Torsten Curdt schrieb:
>> Solaris contains some special code which allows people to mark jar
>> files executable and run them as if they were native commands.  It
>> will only work for jars that contain the sequence 0xCAFE (in
>> big-endian order) somewhere at the beginning, which is achieved by
>> adding an extra field with that header id.
>>
>> See 
>>
>> This is the already existing JarMarker extra field in compress.
>>
>> Ant's  task adds this extra field to the META-INF directory
>> because it knows this is always going to be the very first entry for
>> Ant created jars.
>>
>> I propose to modify JarArchiveOutputStream to add a JarMarker extra
>> field to the very first entry written to the stream.
> 
> Makes sense +1

I double this opinion +1

sum: +2

  Wolfgang

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org