[
https://issues.apache.org/jira/browse/WSCOMMONS-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12660198#action_12660198
]
Andreas Veithen commented on WSCOMMONS-424:
-------------------------------------------
Additional issue:
If the DataHandler was constructed from an Object rather than a DataSource, a
call to DataSource#getInputStream() will start a new thread and return a
PipedInputStream. This is so for Geronimo's as well as Sun's JAF implementaion.
The reason is that DataContentHandler only has a writeTo and no getInputStream
method. Obviously starting a new thread just to check the size of the data is
an overhead that should be avoided.
> BufferUtils#doesDataHandlerExceedLimit needs review
> ---------------------------------------------------
>
> Key: WSCOMMONS-424
> URL: https://issues.apache.org/jira/browse/WSCOMMONS-424
> Project: WS-Commons
> Issue Type: Bug
> Components: AXIOM
> Reporter: Andreas Veithen
>
> The code in BufferUtils#doesDataHandlerExceedLimit has several issues and
> should be reviewed:
> 1) The code never closes the InputStream requested from the DataSource. This
> might have unexpected consequences if the DataSource is a FileDataSource.
> 2) The code assumes that there are DataSources that can only be read once.
> Indeed the code in BufferUtils#getInputStream throws an exception if the
> input stream returned from the DataSource doesn't support mark ("Stream does
> not support mark, Cannot read the stream as DataSource will be consumed.").
> This is plain wrong, because by definition a DataSource can be read several
> times (this is the very reason for the existence of this interface). If there
> are DataSource implementations that can be "consumed", i.e. read only once,
> they need to be fixed.
> 3) The code assumes that the end of stream is reached when
> InputStream#available() returns 0. This is wrong.
> 4) doesDataHandlerExceedLimit tries to establish a lower bound on the
> DataSource size by reading data from it. This is suboptimal, because in most
> cases this can be achieved without actually reading a single byte from the
> data source:
> * If the DataSource is a FileDataSource, it is possible to get the File
> object and the size of the DataSource can be determined from the file size.
> This is much less costly than to open the file and read data from it.
> * InputStream#available() can always be used to get a lower limit on the
> stream size. For a ByteArrayDataSource this actually returns the size
> directly.
> * InputStream#skip can be used to advance in the stream without reading from
> it.
> Only if the InputStream implementation returned by the DataSource neither
> implements available nor skip (this is possible), it is necessary to actually
> read the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.