RE: [External] Re: Zip file upload corruption on Linux

2021-05-26 Thread Scott,Tim
Hi Chris,

> Mine is coming up on 20 years old.
That's worthy of an extra slice of cake :-).

> The code you posted shows imports and then your interaction with the 
> fileupload library. Do you know what else happens before this line of code?

>  ServletRequestContext requestContext = new ServletRequestContext(/* 
> HttpServletRequest  */ request);

I've now reverted my changes to the latest SVN checkin from when I adjusted the 
code to avoid deprecated methods to try to remedy this problem and that line no 
longer exists.

> Can you reproduce this in a testing environment? I'll bet we can write a 
> Filter or Valve which can catch this bug red-handed.

I'd love to have the time to do this, but my motivation to do so has all but 
been killed by pragmatism.

I could send you the two Java classes off-list if you'd like to speculate 
further?

Thanks,
Tim


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [External] Re: Zip file upload corruption on Linux

2021-05-25 Thread Christopher Schultz

Tim,

On 5/25/21 11:22, Scott,Tim wrote:

Hi Chris,


"nah, nobody still uses Struts 1.x".

I wouldn't put it past this 14 year old application ...


:)

Mine is coming up on 20 years old.


But at this point, if you have things working, you can probably
stop.

>

My OCD says No!, but my pragmatic side says "leave it until I
have to change"


But something is *definitely* wrong if changing the default file
encoding causes your files to be corrupted. It is *extraordinarily*
unlikely that Tomcat or Struts is doing this. It is much more
likely to be your application somewhere writing to a Writer instead
of a Stream.


Whilst I haven't explored every class in detail, the classes I have
been working with are the first (of my code) to receive the requests
and the data I'm getting is already corrupted. For example, there's
nothing in my application code which writes to a temporary file as
part of this process. My code writes the data to an Oracle database,
binding as a binary (RAW) value.


The code you posted shows imports and then your interaction with the 
fileupload library. Do you know what else happens before this line of code?


ServletRequestContext requestContext = new ServletRequestContext(/* 
HttpServletRequest  */ request);


If the application calls one of a series of methods on 
HttpServletRequest, it can cause a few things to happen:


1. A "reader" is obtained from the request, which will convert bytes -> 
chars.


2. The "reader" needs to know what character encoding to use. There are 
some rules to determine what encoding that is, but Tomcat itself will 
always fall-back to ISO-8859-1 (per HTTP spec) and that is the encoding 
which does not cause corruption for you.


Can you reproduce this in a testing environment? I'll bet we can write a 
Filter or Valve which can catch this bug red-handed.


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [External] Re: Zip file upload corruption on Linux

2021-05-25 Thread Scott,Tim
Hi Chris,

> "nah, nobody still uses Struts 1.x".
I wouldn't put it past this 14 year old application ...

> But at this point, if you have things working, you can probably stop. 
My OCD says No!, but my pragmatic side says "leave it until I have to 
change"

> But something is *definitely* wrong if changing the default file encoding 
> causes your files to be corrupted. It is *extraordinarily* unlikely that 
> Tomcat or Struts is doing this. It is much more likely to be your application 
> somewhere writing to a Writer instead of a Stream.

Whilst I haven't explored every class in detail, the classes I have been 
working with are the first (of my code) to receive the requests and the data 
I'm getting is already corrupted. For example, there's nothing in my 
application code which writes to a temporary file as part of this process. My 
code writes the data to an Oracle database, binding as a binary (RAW) value.

Thanks,
Tim


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [External] Re: Zip file upload corruption on Linux

2021-05-25 Thread Christopher Schultz

Tim,

On 5/25/21 05:03, Scott,Tim wrote:

Hi Mark,


No. You should be able to use HttpServletRequest.getPart()

I've given up on that attempt as I keep getting:
java.lang.AbstractMethodError: Method 
org/apache/struts/upload/MultipartRequestWrapper.getPart(Ljava/lang/String;)Ljavax/servlet/http/Part;
 is abstract

I have my workaround and do not anticipate it worthwhile me spending
any more time on the matter.
You know, it's funny. I use Struts 1.x and in order to use the 
Tomcat-provided multipart handling, you need to do extra work. I thought 
of that when Mark suggested using Tomcat's multipart parsing but then 
thought "nah, nobody still uses Struts 1.x".


Anyway, if you want to disable Struts's multipart handling, you have to 
add this to your  definition for Struts. Note that this may 
break other parts of your application that might depend upon the Struts 
multipart handling. But at this point, it's not working anyway, so you 
are probably okay.


  
action
org.apache.struts.action.ActionServlet



  Disable Struts Multipart Handling
  multipartClass
  none


  1048576
  1049600
  1024

  

Note that you may have to "merge" the above with what you have in your 
WEB-INF/web.xml.


But at this point, if you have things working, you can probably stop. 
But something is *definitely* wrong if changing the default file 
encoding causes your files to be corrupted. It is *extraordinarily* 
unlikely that Tomcat or Struts is doing this. It is much more likely to 
be your application somewhere writing to a Writer instead of a Stream.


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [External] Re: Zip file upload corruption on Linux

2021-05-25 Thread Scott,Tim
Hi Mark,

> No. You should be able to use HttpServletRequest.getPart()
I've given up on that attempt as I keep getting:
java.lang.AbstractMethodError: Method 
org/apache/struts/upload/MultipartRequestWrapper.getPart(Ljava/lang/String;)Ljavax/servlet/http/Part;
 is abstract

I have my workaround and do not anticipate it worthwhile me spending any more 
time on the matter.

Thanks,
Tim


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [External] Re: Zip file upload corruption on Linux

2021-05-24 Thread Mark Thomas

On 24/05/2021 14:22, Scott,Tim wrote:

Hi Mark,

From: Mark Thomas wrote:


import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
import org.apache.commons.fileupload.servlet.ServletRequestContext;



You are using Commons FileUpload so this issue needs to be raised with

the Apache Commons project.


Alternatively, in Tomcat 9 file upload support is available via the

Servlet API. You could try switching to that (and any bugs would then be
a Tomcat issue).

I replaced "org.apache.commons.fileupload." with 
"org.apache.tomcat.util.http.fileupload." and tried again.

I found no change in behaviour: Leaving file.encoding to default to UTF-8 still 
corrupted the content. Setting it to ISO-8859-1 again resolved it.

Was that the Servlet API you were meaning?


No. You should be able to use HttpServletRequest.getPart()

Mark


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [External] Re: Zip file upload corruption on Linux

2021-05-24 Thread Scott,Tim
Hi Mark,

From: Mark Thomas wrote:

> import org.apache.commons.fileupload.disk.DiskFileItemFactory;
> import org.apache.commons.fileupload.servlet.ServletFileUpload;
> import org.apache.commons.fileupload.servlet.ServletRequestContext;

> You are using Commons FileUpload so this issue needs to be raised with 
the Apache Commons project.

> Alternatively, in Tomcat 9 file upload support is available via the 
Servlet API. You could try switching to that (and any bugs would then be 
a Tomcat issue).

I replaced "org.apache.commons.fileupload." with 
"org.apache.tomcat.util.http.fileupload." and tried again.

I found no change in behaviour: Leaving file.encoding to default to UTF-8 still 
corrupted the content. Setting it to ISO-8859-1 again resolved it.

Was that the Servlet API you were meaning?

Thanks,
Tim


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: [External] Re: Zip file upload corruption on Linux

2021-05-24 Thread Mark Thomas

On 24/05/2021 12:08, Scott,Tim wrote:

Hi Mark,

Thanks for the prompt response.


On 24/05/2021 10:58, Scott,Tim wrote:

Hi experts,

First time poster, here, so I know I'm risking not providing nearly
enough of the right information. Please let me know what I can send to
help you help me further through this.



How are you reading the uploaded file? Please provide the code that does this.

I am reading the InputStream as below:
(merged from two classes, untested, incomplete)

import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
import org.apache.commons.fileupload.servlet.ServletRequestContext;


You are using Commons FileUpload so this issue needs to be raised with 
the Apache Commons project.


Alternatively, in Tomcat 9 file upload support is available via the 
Servlet API. You could try switching to that (and any bugs would then be 
a Tomcat issue).


Mark




import javax.servlet.http.HttpServletRequest;
import java.io.InputStream;
import org.apache.commons.fileupload.FileItem;

ServletRequestContext requestContext = new ServletRequestContext(/* 
HttpServletRequest  */ request);
FileItemFactory factory = new DiskFileItemFactory();
FileUpload fileUpload = new ServletFileUpload(factory);
List entries = fileUpload.parseRequest(requestContext); // <<< this 
call generates the temp file
InputStream inputStream;
for (FileItem item : entries)
{
if (!item.isFormField())
{
   inputStream = item.getInputStream();
   }
}
...
byte[] buffer = new byte[BINARY_BUFFER_SIZE];
bolean eof = false;
while (!eof)
{
int count = inputStream.read(buffer);
if (count == -1)
{
eof = true;
...
   }
   ...
}

Similarly, I am not writing the temp file. I understand that this is done by 
DeferredFileOutputStream as part of the call to ServletFileUpload's 
parseRequest(); The temp file (if created) and the input stream already contain 
corrupted data.


The only way the default encoding should impact things is if the file bytes are 
being converted to String at some point.

Not by me, they're not!


That shouldn't normally happen for an uploaded file.

I agree, it shouldn't. That does not match, however, my finding that:
Using -Dfile.encoding=utf-8 on Windows corrupts the file.
Using -Dfile.encoding=ISO-8859-1 on Linux stops the file 
corruption.

Thanks,
Tim





-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



RE: [External] Re: Zip file upload corruption on Linux

2021-05-24 Thread Scott,Tim
Hi Mark,

Thanks for the prompt response.

>On 24/05/2021 10:58, Scott,Tim wrote:
>> Hi experts,
>>
>> First time poster, here, so I know I'm risking not providing nearly
>> enough of the right information. Please let me know what I can send to
>> help you help me further through this.

>How are you reading the uploaded file? Please provide the code that does this.
I am reading the InputStream as below:
   (merged from two classes, untested, incomplete)

import org.apache.commons.fileupload.disk.DiskFileItemFactory;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
import org.apache.commons.fileupload.servlet.ServletRequestContext;
import javax.servlet.http.HttpServletRequest;
import java.io.InputStream;
import org.apache.commons.fileupload.FileItem;

ServletRequestContext requestContext = new ServletRequestContext(/* 
HttpServletRequest  */ request);
FileItemFactory factory = new DiskFileItemFactory();
FileUpload fileUpload = new ServletFileUpload(factory);
List entries = fileUpload.parseRequest(requestContext); // <<< this 
call generates the temp file
InputStream inputStream;
for (FileItem item : entries)
{
   if (!item.isFormField())
   {
  inputStream = item.getInputStream();
  }
   }
   ...
byte[] buffer = new byte[BINARY_BUFFER_SIZE];
bolean eof = false;
while (!eof)
{
int count = inputStream.read(buffer);
if (count == -1)
{
eof = true;
...
  }
  ...
   }

Similarly, I am not writing the temp file. I understand that this is done by 
DeferredFileOutputStream as part of the call to ServletFileUpload's 
parseRequest(); The temp file (if created) and the input stream already contain 
corrupted data.

> The only way the default encoding should impact things is if the file bytes 
> are being converted to String at some point.
Not by me, they're not!

> That shouldn't normally happen for an uploaded file.
I agree, it shouldn't. That does not match, however, my finding that:
   Using -Dfile.encoding=utf-8 on Windows corrupts the file.
   Using -Dfile.encoding=ISO-8859-1 on Linux stops the file 
corruption.

Thanks,
Tim



Re: Zip file upload corruption on Linux

2021-05-24 Thread Mark Thomas

On 24/05/2021 10:58, Scott,Tim wrote:

Hi experts,

First time poster, here, so I know I’m risking not providing nearly 
enough of the right information. Please let me know what I can send to 
help you help me further through this.


How are you reading the uploaded file? Please provide the code that does 
this.


The only way the default encoding should impact things is if the file 
bytes are being converted to String at some point. That shouldn't 
normally happen for an uploaded file.


Mark


I’m using separate deployments of Tomcat 9 on Linux (RedHat 7) and 
Windows for the same mature .war application.


Around Jan 2020 I found that uploads of ZIP files to the Linux Tomcat 
were getting corrupted. The Windows upload worked fine. After much 
digging I found this appears to relate to the file.encoding property.


Launching the Tomcat 9 service on Windows with “-Dfile.encoding=UTF-8” 
(overriding the default of Cp1252) causes the Windows upload to corrupt 
the data.


It would appear, therefore, that file.encoding is affecting binary file 
uploads and I do not think it should. With this set to utf-8, I am 
observing that invalid utf-8 characters are been replaced with “ef bf 
bd” (the BOM/”unknown character” for UTF-8).


Is there a way to address this?

I believe source .jsp files are utf-8 encoded and I deal with utf-8 in 
many parts of the application. I would rather add this encoding to the 
Windows deployments than use, e.g., -Dfile.encoding=ISO-8859-1 on Linux.


Note also “If the draft JEP discussed in this post is implemented, the 
default charset for file contents will be changed to UTF-8 even for 
Windows.”


    Ref: 
https://dzone.com/articles/java-may-use-utf-8-as-its-default-charset 
 
(March 1st, 2018)


I’ve put some details / “evidence” below should you wish to read further.

Thank you,

Tim

This morning, with Tomcat 9.0.45, I again captured a tcpdump to show 
that the browser is sending the correct data. The temp file which Tomcat 
created prior to passing the stream to my application is corrupted.


Part of the tcpdump submission is:

--WebKitFormBoundary37kBaouQxD4aoug5

Content-Disposition: form-data; name="file.ob_filename"; filename="MEP.zip"

Content-Type: application/x-zip-compressed

PK.`.Rtbl_Evidence.csv.Zks.H..[.=y.Do/..a.`.. 
.T..i..{..$c..3X.Q..

;..Q,.q..e.&..P$.X..0*.3The “89” and “b3” (no doubt an invalid utf-8 characters) have been 
replaced with “ef bf bd”. This is repeated later for each subsequent 
invalid utf-8 character.


In case this is relevant, I’m using Amazon’s Corretto JDK 11.0.4 
(64-bit) on Linux (11.0.7 now on Windows) but I’ve observed this problem 
with JDK8 and I can’t say when it started. I know it worked a few years 
ago on Linux and Windows, but can’t dig out the version information for 
then.


    NB: Just updated to JDK 11.0.11 and it made no difference.

My extensive, repeated and varied searches merely confirm that my HTML 
is OK, the form submission is as intended. Maybe the process for reading 
the data is out of date but it works fine on Windows (Java is meant to 
be a WORM language) and all the debugging I do shows that the data is 
corrupt before my application sees it.


My JVM property file.encoding = UTF-8 on Linux and was Cp1252 on Windows.

--

Tim Scott

*OCLC* · Senior Software Engineer / Technical Product Manager

CityGate, 8 St. Mary’s Gate, Sheffield S1 4LW, UK

cc: IT file

OCLC COVID-19 resources: oc.lc/covid19-service-info 



COVID-19: We’re in this together 






-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Zip file upload corruption on Linux

2021-05-24 Thread Scott,Tim
Hi experts,

First time poster, here, so I know I'm risking not providing nearly enough of 
the right information. Please let me know what I can send to help you help me 
further through this.

I'm using separate deployments of Tomcat 9 on Linux (RedHat 7) and Windows for 
the same mature .war application.

Around Jan 2020 I found that uploads of ZIP files to the Linux Tomcat were 
getting corrupted. The Windows upload worked fine. After much digging I found 
this appears to relate to the file.encoding property.

Launching the Tomcat 9 service on Windows with "-Dfile.encoding=UTF-8" 
(overriding the default of Cp1252) causes the Windows upload to corrupt the 
data.

It would appear, therefore, that file.encoding is affecting binary file uploads 
and I do not think it should. With this set to utf-8, I am observing that 
invalid utf-8 characters are been replaced with "ef bf bd" (the BOM/"unknown 
character" for UTF-8).

Is there a way to address this?

I believe source .jsp files are utf-8 encoded and I deal with utf-8 in many 
parts of the application. I would rather add this encoding to the Windows 
deployments than use, e.g., -Dfile.encoding=ISO-8859-1 on Linux.

Note also "If the draft JEP discussed in this post is implemented, the default 
charset for file contents will be changed to UTF-8 even for Windows."
   Ref: 
https://dzone.com/articles/java-may-use-utf-8-as-its-default-charset (March 
1st, 2018)

I've put some details / "evidence" below should you wish to read further.

Thank you,
Tim


This morning, with Tomcat 9.0.45, I again captured a tcpdump to show that the 
browser is sending the correct data. The temp file which Tomcat created prior 
to passing the stream to my application is corrupted.

Part of the tcpdump submission is:

--WebKitFormBoundary37kBaouQxD4aoug5
Content-Disposition: form-data; name="file.ob_filename"; filename="MEP.zip"
Content-Type: application/x-zip-compressed

PK.`.Rtbl_Evidence.csv.Zks.H..[.=y.Do/..a.`..
 .T..i..{..$c..3X.Q..https://oc.lc/covid19-service-info>
[COVID-19: We're in this 
together]