Re: [PATCH] '8859_1' is not a valid charset alias

2001-05-19 Thread Vincent Schonau

On Sat, May 19, 2001 at 03:19:09PM -0700, [EMAIL PROTECTED] wrote:
> Vicent, Forrest,
> 
> Thanks for the patch & review. 
> 
> Could you summarize and/or expand a bit :-) ? 

The changes I made affect two uses of the concept of Character encodings:

  1 what's being sent to the browser (ie in JspParseEventListener) as
HTTP headers (as literals)
  2 what's being used to set the CharacterEncoding of input and output 
streams

The reason I made the patch is that an (older) version of Lynx that I use to
test apps barfed on the "text/html; charset=8859_1" header. I noticed this
was non-standard, and that it's all over the tree; hence the patch.
It's just a standards thing. ("iso-8859-1" is the 'preferred mime name' for
this charset; see the IANA charset list that I pointed too). That's category
1.

Forrest then pointed out that the code I touched affect the selection of
encodings in Java, and that there is a performance gain to be had.

I did a little investigation into Forrests remarks, and it turns out that
_consistently_ using something other than what Java looks at as name of the
encoding of a string can have an enormous impact (on my benchmark); using
the canonical name ("ISO8859_1") instead of some alias ("ISO-8859-1" or
"8859_1") can cause a performance win of up to 20x (!) If one looks up the
canonical name of the charset before accessing a String with a non-default
encoding, the total cost is only 1.5x the cost of accessing it with the
encodings canonical name.

I've looked at the 3.x tree, and from superficial tests, it looks like this
specific code is hardly ever reached by tomcat, so optimising it may not, in
fact, do any good for anyone using iso-8895-1 for most content &
user-agents. Most of the work is already done, so I'll do it anyway.

That's category 2. (patch coming up).

> Also, does anyone played with the various browsers ? Is any browser
> sending the charset encoding ? What format ? 

I've been playing with this, but I don't have any definite results. As part
of the work for issue 2 above, I'll be testing this.

There isn't actually any reference to charsets used in the request in
Servler 2.2; but there is in 2.3 (SRV.4.9 Request data encoding). (they say
there that there aren't many browsers sending Content-Encoding with the
request, currently).

> I know that some browsers are encoding the URL with the same charset that
> is used in the page, while some are using UTF ( there was discussion about
> that somewhere). 

If you have a reference to this, I'll be happy to look into it.

> Is it true that browsers that are using UTF ( like IE on NT ? ) do send
> the body as UTF ? Do they set the Charset-Encoding header ?
> 
> I would really apreciate some info ( I don't use Windows, and I heard
> there are differences between IE/Win9x and IE/NT )

I have no data on this yet, but I will soon.


Hope this helps,


Vince.



upload data corruption report

2001-05-19 Thread DAK

I've been asked to provide more information, so here is combination of 
the two messages I posted with some more commentary and attachments.

It pertains to Tomcat-3.2.1 and looks to be  the same in 3.2.2.b4. I'm 
running Apache 1.3.17 on Win 2K Professional. I'm also using mod_jk

I have some client code that sends a jar file to the servlet. The jar 
file was getting corrupted. After much digging, I found a CVS commit to 
Ajp13ConnectorRequest.java that mentioned a problem like this with the 
doRead() method. It turns out the the same applies to the doRead(byte[], 
int, int) method. The same problem exists in the Ajp12ConnectionHandler 
for that byte array read. Single byte reads for both protocols work just 
fine. I'm including the diffs for these classes to show what I'm talking 
about.


I finally got out from under some work and was able to make some test 
code. I'm attaching the client and servlet code.
The code transfers a couple parameters, then a binary file (I was using 
a .jar). If you call the client with
"BinTestClient localhost something.jar b", it uses byte-by-byte read on 
the server to spool the file to a temp file. If you call the client 
without the 'b', it uses the byte-array read that I was complaining 
about.  Transfer a file, then try "jar tvf test.jar" to see if it 
works. I uses a jar that contains .jpg images and when using the byte 
array read method, it creats a corrupt jar file. If I apply my fix to 
the Ajp13ConnectorRequest class, it works fine.
(I tried a jar that contained class files and it worked anyway...)
I'd like for someone else to try this out to make sure I didn't screw 
something up. The code seems pretty simple.
I discovered this when using JarIn/OutputStream to transfer data from  
client to servlet.I've seen this type of thing in Java before when 
writing code that talks to hardware (such as touchscreen driver and 
scanner drivers).

   David



Index: Ajp13ConnectorRequest.java
===
RCS file: 
/home/cvspublic/jakarta-tomcat/src/share/org/apache/tomcat/service/connector/Attic/Ajp13ConnectorRequest.java,v
retrieving revision 1.5.2.7
diff -r1.5.2.7 Ajp13ConnectorRequest.java
274c274,277
<   System.arraycopy(bodyBuff, pos, b, off, c);
---
>   //System.arraycopy(bodyBuff, pos, b, off, c);
>   for (int i=pos, j=off, d=c; d > 0; i++, j++, d--) {
>   b[j] = (byte)(((char)bodyBuff[i])&0xff);
>   }

What I've done here is to replace the array copy with a loop that does the appropriate 
data conversion.


Index: Ajp12ConnectionHandler.java
===
RCS file: 
/home/cvspublic/jakarta-tomcat/src/share/org/apache/tomcat/service/connector/Attic/Ajp12ConnectionHandler.java,v
retrieving revision 1.28.2.4
diff -r1.28.2.4 Ajp12ConnectionHandler.java
542a543,549
> public  int read(byte b[], int off, int len) throws IOException {
>   int ret = super.read(b, off, len);
>   for (int i=0, j=off; i   b[j] = (byte)(((char)b[j])&0xff);
>   }
>   return ret;
> }

In this case, I over-rode the read method to convert the data after calling the 
super.read 




import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URL;
import java.net.URLConnection;

//  args[0] = hostname
//  args[1] = jarfile
//  args[2] = 'b' for single byte read.
public class BinTestClient {

public static void main(String [] args) {
try {
URL url = new URL("http://"+args[0]+"/examples/BinTest";);
URLConnection connection = (URLConnection)url.openConnection();
connection.setDoOutput(true);
connection.setUseCaches(false);
DataOutputStream output = new 
DataOutputStream(connection.getOutputStream());
File jarFile = new File(args[1]);
if (jarFile.exists()) {
output.writeUTF(""+jarFile.length());
}
if (args.length > 2 && args[2] != null && 
args[2].trim().equals("b"))
output.writeChar('b');
else
output.writeChar(' ');

InputStream istr = new FileInputStream(jarFile);
byte [] buf = new byte[8192];
int count = istr.read(buf);
while (count != -1) {
if (count > 0)
output.write(buf, 0, count);
count = istr.read(buf);
}
istr.close();
output.flush();
output.close();

   

Re: [PATCH] '8859_1' is not a valid charset alias

2001-05-19 Thread cmanolache

Vicent, Forrest,

Thanks for the patch & review. 

Could you summarize and/or expand a bit :-) ? 

Also, does anyone played with the various browsers ? Is any browser
sending the charset encoding ? What format ? 

I know that some browsers are encoding the URL with the same charset that
is used in the page, while some are using UTF ( there was discussion about
that somewhere). 

Is it true that browsers that are using UTF ( like IE on NT ? ) do send
the body as UTF ? Do they set the Charset-Encoding header ?

I would really apreciate some info ( I don't use Windows, and I heard
there are differences between IE/Win9x and IE/NT )

Costin


On Sat, 19 May 2001, Vincent Schonau wrote:

> On Fri, May 18, 2001 at 12:40:04PM -0700, Forrest R. Girouard wrote:
> > 
> > It is my understanding that '8859_1' is an alias for a Java encoding 
> > which maps to the 'ISO-8859-1' character set.  The Java encoding and
> > the character set name are not always the same.
> > 
> > Furthermore, while it's not readily apparent using 'ISO8859_1' for
> > the Java encoding is far preferable to using '8859_1' (or anything 
> > else) under Java 2.  
> > 
> > Look at the private getBTCConverter() method in the String.java source
> > and note the use of the following:
> > 
> > !encoding.equals(btc.getCharacterEncoding())
> > 
> > The ByteToCharConverter instance for ISO-8859-1 always returns 'ISO8859_1'
> > for the getCharacterEncoding() method and this means that while other
> > names may work the ThreadLocal caching will be subverted.  Since the
> > ByteToCharConverter.getConverter() method involves synchronization it
> > is not a good thing to subvert the ThreadLocal cache.
> 
> Thanks for pointing this out. AFAICS, the use of 'iso-8859-1' instead of
> '8859_1' (my patch) does not make this situation any better or worse in the
> tomcat code. 
> 
> The tomcat 3.x code doesn't look like it takes this into account at all. I
> wonder if looking up the Java Encoding name associated with the encoding
> name supplied by user-agents etc. is an optimisation worth making. I'll look
> into that.
> 
> 
> 
> Vince.
> 




Re: Does the beta Tomcat 4 support multiple TLD files in a jar?

2001-05-19 Thread Jayson Falkner

> Well, first it shouldn't be just plain "uri":
>
> 
> /myPRlibrary
> /WEB-INF/tlds/PRlibrary_1_4.tld
> 

Are you referring to an entry in the web.xml file?

I was asking about having multiple Tag Library Descriptors in a JAR.
According the the JSP 1.2 pfd anything in the META-INF directory with the
".tld" extension should get mapped accordingly by the "uri" attribute.

I was snagging  the uri element from the JSP 1.2 TLD DTD in the specs. The
idea was to deploy the entire set of tags through a JAR, not by editing
web.xml at all. As I understood this is possible. With JSP 1.1 you can do
the same but may only have one TLD file in the JAR.

Were you addressing this?

Jayson Falkner
V.P./CTO, Amberjack Software LLC
[EMAIL PROTECTED]
www.jspinsider.com






Re: [VOTE] Final release of Tomcat 3.2.2

2001-05-19 Thread Dan Milstein

+0 (I don't think I'll have time to do any support), but way to go Marc!!!

-Dan

Marc Saegesser wrote:
> 
> The latest beta cycle for Tomcat 3.2.2 has completed with no new bugs
> identified.  As the release manager I propose that we release the tomcat_32
> branch as Tomcat 3.2.2.  Please indicate your vote for the release using the
> ballot below.
> 
> I will tabulate and post the results of this vote on Friday, May 25.  At
> that time, if the vote has passed, I will tag, build and distribute the
> release.  The vote must pass by majority approval which means the proposal
> must receive at least three +1 votes and more +1 votes than -1 votes.
> 
> Marc Saegesser
> 
> -
> 
> Vote to release the tomcat_32 branch as Tomcat 3.2.2.
> 
> [ ] +1.  I agree with the proposal and I will help support
>  the release.
> [ ] +0.  I agree with the proposal but I will not be able
>  to help support the release.
> [ ] -0.  I don't agree with the proposal but I won't stop
>  the release.
> [ ] -1.  I disagree with the proposal and will explain my
>  reasons.

-- 

Dan Milstein // [EMAIL PROTECTED]



Re: servlet upload data corruption (more)

2001-05-19 Thread Dan Milstein

David,

A detailed bug report w/ test case is *great*, but it would also be very,
very helpful if you could specify:

1) What version of Tomcat you are running (precisely)
2) What web server you are running, and its version
3) Your OS

-Dan

DAK wrote:
> 
> I finally got out from under some work and was able to make some test
> code. I'm attaching the client and servlet code.
> The code transfers a couple parameters, then a binary file (I was using
> a .jar). If you call the client with
> "BinTestClient localhost something.jar b", it uses byte-by-byte read on
> the server to spool the file to a temp file. If you call the client
> without the 'b', it uses the byte-array read that I was complaining
> about.  Transfer a file, then try "jar tvf test.jar" to see if it
> works. I uses a jar that contains .jpg images and when using the byte
> array read method, it creats a corrupt jar file. If I apply my fix to
> the Ajp13ConnectorRequest class, it works fine.
> (I tried a jar that contained class files and it worked anyway...)
> I'd like for someone else to try this out to make sure I didn't screw
> something up. The code seems pretty simple.
> I discovered this when using JarIn/OutputStream to transfer data from
> client to servlet.
> 
>David
> 
>   
> 
> import java.io.DataOutputStream;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.InputStream;
> import java.io.OutputStream;
> import java.net.URL;
> import java.net.URLConnection;
> 
> //  args[0] = hostname
> //  args[1] = jarfile
> //  args[2] = 'b' for single byte read.
> public class BinTestClient {
> 
> public static void main(String [] args) {
> try {
> URL url = new URL("http://"+args[0]+"/examples/BinTest";);
> URLConnection connection = 
>(URLConnection)url.openConnection();
> connection.setDoOutput(true);
> connection.setUseCaches(false);
> DataOutputStream output = new 
>DataOutputStream(connection.getOutputStream());
> File jarFile = new File(args[1]);
> if (jarFile.exists()) {
> output.writeUTF(""+jarFile.length());
> }
> if (args.length > 2 && args[2] != null && 
>args[2].trim().equals("b"))
> output.writeChar('b');
> else
> output.writeChar(' ');
> 
> InputStream istr = new FileInputStream(jarFile);
> byte [] buf = new byte[8192];
> int count = istr.read(buf);
> while (count != -1) {
> if (count > 0)
> output.write(buf, 0, count);
> count = istr.read(buf);
> }
> istr.close();
> output.flush();
> output.close();
> 
> istr = connection.getInputStream();
> istr.read();
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
> }
> 
>   
> 
> import java.io.DataInputStream;
> import java.io.File;
> import java.io.FileNotFoundException;
> import java.io.FileOutputStream;
> import java.io.InputStream;
> import java.io.IOException;
> import java.io.OutputStream;
> 
> import javax.servlet.http.HttpServlet;
> import javax.servlet.http.HttpServletRequest;
> import javax.servlet.http.HttpServletResponse;
> 
> public class BinTestServlet extends HttpServlet{
> 
> public void doPost (HttpServletRequest request, HttpServletResponse 
>response) {
> try {
> DataInputStream istr = new 
>DataInputStream(request.getInputStream());
> long fileLen = Long.parseLong(istr.readUTF());
> char mode = istr.readChar();
> 
> File tmp = File.createTempFile("test", ".jar");
> OutputStream fstr = new FileOutputStream(tmp);
> if (mode == 'b') {
> System.out.println("Using byte-by-byte read");
> for (int i=0; i fstr.write(istr.read());
> }
> else {
> System.out.println("Using byte-array read");
> byte [] buf = new byte[8192];
> int count = istr.read(buf);
> while (count != -1) {
> 

The wonderfull worlds of encodings...

2001-05-19 Thread cmanolache

Hi,

I've got a terible headache... It happens all the time I try to touch the
bugs related with encodings - any of them...

I'm sure you already know ( but I just found out ) what
"surrogate" characters are. I know that UTF is _not_ 16 bits, but I had no
idea it is 21 bits ( as opposed to UCS - 31 bits ). 

I'll try to get something working this weekend. Craig - you may want to
take a look, the code in "DefaultServlet" is creating a writter for each
encoding ( that's terribly expensive ), and doesn't seem to deal with
surrogates ( well, the second part is not a problem - I doubt someone
would use hieroglyphs or musical signs in a URL ). 

Now, the biggest problem is as ussually M$. From strange reasons, MSIE's
javascript encode() method is generating % sequences instead of %XX%XX
( as most would expect ). That means the whole decoding might have to be
rewritten 3.3 ( Apache doesn't deal with that either ). 

Question: what should happen with the context path ? It is supposed to be
returned in the orignal form ( not decoded ) - but that can't work as a
certain path can be encoded in many ways. I'm also not sure what should
happen if web.xml and in server.xml ( where path is defined ) - should we
use %xx encoded URLs ? But what would that mean for characters that have
multiple encodings ? 
 

The solution I have in mind right now is to keep doing all the mappings
and process web.xml - and do all internal operations with decoded
characters, while keeping the "original" form for the facade, so servlets
get what they expect.

Any ideas ? I'm not sure I can handle this.


Costin




Re: Does the beta Tomcat 4 support multiple TLD files in a jar?

2001-05-19 Thread Aaron Mulder

On Sat, 19 May 2001, Jayson Falkner wrote:
> I assume it does. If so what is the correct way to use this functionality? I
> have been having little luck trying and can't find the answer documented.
>
> Here is a little insight on what I was attempting. The JAR has all of the
> class files in their correct directories along with a TLD in the META-INF
> directory named "exampleTags.tld".
>
> exampleTags.tld has the uri element set with the value "/exampleTags.tld".
>
> ...
> 
>   ...
>   /exampleTags.tld
> 

Well, first it shouldn't be just plain "uri":


/myPRlibrary
/WEB-INF/tlds/PRlibrary_1_4.tld


And second, what are you using for the taglib-location?  That's
how it would locate the .tld file.  If you're not using one, perhaps you
need to prefix the URI with META-INF/ since all the TLDs are required to
be in the META-INF directory of the packaged JAR.

Aaron

> I am trying the following with a JSP.
> ---
> <%@ taglib prefix="e" uri="/exampleTags.tld" %>
> Here is the example tag output: 
> ---
> But a servlet exception error message keeps popping up.
>
> org.apache.jasper.JasperException: File "/exampleTags.tld" not found
> ...
>
>
> Anyone had this before? Advice would be appreciated.
>
> Jayson Falkner
> V.P./CTO, Amberjack Software LLC
> [EMAIL PROTECTED]
> www.jspinsider.com
>
>




Does the beta Tomcat 4 support multiple TLD files in a jar?

2001-05-19 Thread Jayson Falkner

I assume it does. If so what is the correct way to use this functionality? I
have been having little luck trying and can't find the answer documented.

Here is a little insight on what I was attempting. The JAR has all of the
class files in their correct directories along with a TLD in the META-INF
directory named "exampleTags.tld".

exampleTags.tld has the uri element set with the value "/exampleTags.tld".

...

  ...
  /exampleTags.tld


I am trying the following with a JSP.
---
<%@ taglib prefix="e" uri="/exampleTags.tld" %>
Here is the example tag output: 
---
But a servlet exception error message keeps popping up.

org.apache.jasper.JasperException: File "/exampleTags.tld" not found
...


Anyone had this before? Advice would be appreciated.

Jayson Falkner
V.P./CTO, Amberjack Software LLC
[EMAIL PROTECTED]
www.jspinsider.com





servlet upload data corruption (more)

2001-05-19 Thread DAK


I finally got out from under some work and was able to make some test 
code. I'm attaching the client and servlet code.
The code transfers a couple parameters, then a binary file (I was using 
a .jar). If you call the client with
"BinTestClient localhost something.jar b", it uses byte-by-byte read on 
the server to spool the file to a temp file. If you call the client 
without the 'b', it uses the byte-array read that I was complaining 
about.  Transfer a file, then try "jar tvf test.jar" to see if it 
works. I uses a jar that contains .jpg images and when using the byte 
array read method, it creats a corrupt jar file. If I apply my fix to 
the Ajp13ConnectorRequest class, it works fine.
(I tried a jar that contained class files and it worked anyway...)
I'd like for someone else to try this out to make sure I didn't screw 
something up. The code seems pretty simple.
I discovered this when using JarIn/OutputStream to transfer data from 
client to servlet.

   David



import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URL;
import java.net.URLConnection;

//  args[0] = hostname
//  args[1] = jarfile
//  args[2] = 'b' for single byte read.
public class BinTestClient {

public static void main(String [] args) {
try {
URL url = new URL("http://"+args[0]+"/examples/BinTest";);
URLConnection connection = (URLConnection)url.openConnection();
connection.setDoOutput(true);
connection.setUseCaches(false);
DataOutputStream output = new 
DataOutputStream(connection.getOutputStream());
File jarFile = new File(args[1]);
if (jarFile.exists()) {
output.writeUTF(""+jarFile.length());
}
if (args.length > 2 && args[2] != null && 
args[2].trim().equals("b"))
output.writeChar('b');
else
output.writeChar(' ');

InputStream istr = new FileInputStream(jarFile);
byte [] buf = new byte[8192];
int count = istr.read(buf);
while (count != -1) {
if (count > 0)
output.write(buf, 0, count);
count = istr.read(buf);
}
istr.close();
output.flush();
output.close();

istr = connection.getInputStream();
istr.read();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}



import java.io.DataInputStream;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class BinTestServlet extends HttpServlet{

public void doPost (HttpServletRequest request, HttpServletResponse response) {
try {
DataInputStream istr = new 
DataInputStream(request.getInputStream());
long fileLen = Long.parseLong(istr.readUTF());
char mode = istr.readChar();

File tmp = File.createTempFile("test", ".jar");
OutputStream fstr = new FileOutputStream(tmp);
if (mode == 'b') {
System.out.println("Using byte-by-byte read");
for (int i=0; i 0)
fstr.write(buf, 0, count);
count = istr.read(buf);
}
}
fstr.flush();
fstr.close();

OutputStream ostr = response.getOutputStream();
ostr.write(1);  // positive response
} catch (Exception ex) {
ex.printStackTrace();
}
}
}



Re: [PATCH] '8859_1' is not a valid charset alias

2001-05-19 Thread Vincent Schonau

On Fri, May 18, 2001 at 12:40:04PM -0700, Forrest R. Girouard wrote:
> 
> It is my understanding that '8859_1' is an alias for a Java encoding 
> which maps to the 'ISO-8859-1' character set.  The Java encoding and
> the character set name are not always the same.
> 
> Furthermore, while it's not readily apparent using 'ISO8859_1' for
> the Java encoding is far preferable to using '8859_1' (or anything 
> else) under Java 2.  
> 
> Look at the private getBTCConverter() method in the String.java source
> and note the use of the following:
> 
>   !encoding.equals(btc.getCharacterEncoding())
> 
> The ByteToCharConverter instance for ISO-8859-1 always returns 'ISO8859_1'
> for the getCharacterEncoding() method and this means that while other
> names may work the ThreadLocal caching will be subverted.  Since the
> ByteToCharConverter.getConverter() method involves synchronization it
> is not a good thing to subvert the ThreadLocal cache.

Thanks for pointing this out. AFAICS, the use of 'iso-8859-1' instead of
'8859_1' (my patch) does not make this situation any better or worse in the
tomcat code. 

The tomcat 3.x code doesn't look like it takes this into account at all. I
wonder if looking up the Java Encoding name associated with the encoding
name supplied by user-agents etc. is an optimisation worth making. I'll look
into that.



Vince.