I agree with you conclusion, if in the future someone wants to fix
the network server case then it should not be difficult to make the
embedded case match.  Going forward I think our best shot at having
reasonable stream implementations is that we should assume they
can only be read once both on the way into the server and on the
way out (at least the normal optimized codepath should do so).

1 meg will cause the large blob/clob code path.  As a rule of thumb
stream based tests probably should have 3 instances:
1) blob/clob less than 1k
2) blob/clob more than 250k
3) blob/clob more than the allowed heap size in jvm you are testing,
   these tests can follow the exising ones in the large test area.

The difference between
1 meg a 1 gig is that it is hard to tell if the underlying code is
materializing the entire stream into memory with 1 meg.  There is some
number between the 2 (with appropriate setting of max jvm heap) that will.

TomohitoNakayama wrote:

Mike Matrigali wrote:

For embedded I was worried about your description of changes, that made
it sound like you somehow were going to buffer the blob in memory.  I
see from your changes you basically added reset calls if the underlying
stream was resetable.  What I don't know is what happens in the 2 gig
blob/clob cases, either you will have to investigate or maybe someone
on the list knows?

As you saw, the patch does not make new cache, just resets the stream.
I tested just case of 1 mega lob and confirmed lob was streamed from the beginning from 2nd InputStream. I don't think there exists qualitative difference in behavior between 1M and 1G, though I'm not completely confirmed it.

However, I understand your opinion that this patch will implicitly restrict implementation of network client , that entire information streamed from server have to be stored , because streaming between server and client are performed only once.

Now I think it is preferable to throw Exception
when 2nd Reader/InputStream for same value in result was retrieved or
when Reader/InputStream was retrieved in different order as in sql.
// I hope other's opinion around restriction not to allow user to retrieve Reader/InputStream for result columns in different order as in sql .

Once, I think the restriction may be too hard for user ,
however I conclude the restriction is reasonable because ResultSet is not cache for set of result and
have characteristic of  Stream (especially when lob was used ).
If user needs cache , the cache should be developed as separated from ResultSet .

Thank you for your suggestion.
I didn't realize this Stream like characteristics of ResultSet .

Best regards.

Mike Matrigali wrote:

I don't have enough information to completely answer, but will
try to state my opinion on the issue.

I think the goals should be:
1) provide same behavior in embedded and network server mode.
2) provide same behavior whether the blob is "small" or "large".
3) optimize the standard case of getting the column once in jdbc,
  as the spec allows.
4) If at all possible when selecting a blob/clob as a stream it should
  not be necessary to materialize the entire stream in memory.

For embedded I was worried about your description of changes, that made
it sound like you somehow were going to buffer the blob in memory.  I
see from your changes you basically added reset calls if the underlying
stream was resetable.  What I don't know is what happens in the 2 gig
blob/clob cases, either you will have to investigate or maybe someone
on the list knows?

For embedded it is theoretically possible for the reset of the stream
to go all the way back to store and read it again from the beginning.
For network client this seems even more complicated to do in an
optimized way (I believe you are looking at improving the streaming
behavior of large objects to network client so I defer to you in
how hard this may be).

My opinion would be to make the second reference throw an error, to
make that behavior consistent in network server, embedded, long and
short blob/clob streams.  And to document that behavior.

Having said that, I am not against the code working as you are moving
toward as long as it does not cause a memory/runtime performance issue
for the normal single get stream case.

TomohitoNakayama wrote:

Hello Daniel and Mike .

Do you think it is preferable not to allow user to call getXXXXStream
twice from one row ,
in order to make a room for releasing memory for cache in ResultSet as
soon as possible ?

Best regards.

Daniel John Debrunner wrote:

Mike Matrigali wrote:

Is there anything in the standard that says what the second call to
the get the stream has to do?  Imagine the case where the first
stream reads 1 gig of a 2 gig blob, does the second call to
getBinaryStream() have to return the 1st gig again?

Yes & no.

Nothing in the JDBC spec doc, but the javadoc for java.sql.ResultSet has
always had:

" For maximum portability, result set columns within each row should be
read in left-to-right order, and each column should be read only once."

Thus, Derby could thrown an exception if there was a second getXXXStream
call on the same column.

Any change that tries to cache the bytes returned by the first
getBinaryStream either in local client or network client code is
going to be a performance/memory drain.

Agreed, we need to be careful here, we need to optmise the frequent
case, getting the column's value once as-per JDBC.


Reply via email to