Re: DERBY-688: Some review comments and feedback

Army Mon, 07 Aug 2006 14:30:49 -0700

Hi Bryan,

Thanks so much for reviewing the changes and for verifying the patches. Iappreciate your time with this.

3) Who will run these tests, and when? If all the execution code isoptional, how do we ensure that it doesn't get broken?

I still have a couple of more patches to post to complete the XML work (phases4, 5, ...). One of those patches (probably the final one) will enable thexmlSuite to be run as part of derbyall. Due to the dependencies on Xalan and ona JAXP implementation, the suite will not run against all JVMs: to start, I'lljust have it running against JVMs that have the required classes included inthem. This was originally going to be Sun and IBM 1.4, but as you discoveredthis past weekend, the Xalan that's embedded with Sun jdk 1.4.2 is not a recentenough version to pass. The tests cannot run against Sun 1.5 because the Xalanclasses have (I believe) been renamed in Sun jdk 1.5 and thus will not beavailable to Derby (so far as I understand it, the user would have to downloadan external version of Xalan, as with Sun jdk 1.4.2).

So at this point, I think the xmlSuite will only be run for IBM 1.4 and IBM 1.5,since those jvms include Xalan 2.5 or greater and also include a JAXP parser.So anyone running derbyall with either of those jvms would run the XML testsautomatically. That said, the nightly tests that are reported here:


http://people.apache.org/~fuzzylogic/derby_test_results/

show results against ibm1.4.2. So while XML failures will not show up in Ole'sreport, we should at least be able to see them at the above-indicated URL.

4) Can you further explain the BY VALUE vs. BY REF behaviors? What do these
   clauses mean, why is BY REF better, at what point would we want to
   re-introduce BY VALUE, how does this manifest itself in the code?

The main way in which BY VALUE vs BY REF manifests itself in the code is whendealing with variable bindings. SQL/XML[2006] defines a syntax by which a valuecan be bound into a query expression. For example:


select
  xmlserialize(
    xmlquery('$ci/my/stuff' passing by ref xcol as "ci" empty on empty)
  as clob)
from xt_1

In this query "xcol" is bound into the variable "$ci" and then the query isexecuted. A key way in which BY REF and BY VALUE come into play, then, is whencomparison operations between more than one XML value are part of the query.Take the following query:


select
  xmlserialize(
    xmlquery('$ci/[EMAIL PROTECTED] = $c2/[EMAIL PROTECTED]' passing by ref
      xcol as "ci", xcol as "c2" empty on empty
    )
  as clob)
from xt_1

If "ci" and "c2" are passed BY REF then the result of this query would be"true"; if either was passed BY VALUE, the result would be "false". This iswhat I tried to capture in the comments in sqlgrammar.jj, where I have:


 * [I]f the same XML value is passed BY REF into two different XML arguments
 * for a single operator, then every node in the first XML argument must have an
 * identical node in the second XML argument, and the ids for both nodes must be
 * the same.

I admit that the comment there could use some more explanation--but hopefully Ican do that as a follow-up patch, instead of re-generate the patches from squareone...?

A while back I was prototyping some code to support XML binding and I found thatit was both easier and more efficient to support BY REF, so that's theexperience on which I've based the decision to use BY REF.

That said, though, it turns out that Xalan does not support variable binding (orif it does, I haven't figured it out yet), so the difference between BY REF andBY VALUE is just syntactic right now. I've chosen "BY REF" because that waseasiest for me to implement when I did the prototyping for variable binding, andI think that's the way to go in future. If at any point someone wants to frythe "BY VALUE" fish, then s/he should certainly feel free to do so :)

5) If/when you re-generate the patches, please use relative path names for
   the files in the patches so that we don't get strings like
   c:/private/derby_src/java in the file names.


Yes, will do.  Sorry.

most of your examples and tests show the use of extremely tiny XML documents;they can fit into literal strings and are at most a few hundred bytes long.But in practice, XML documents are often ridiculous gigantic things which arehundreds of thousands of bytes long, and people try not to manipulate them inmemory, but rather read them from files and write them to files, streaming
them through parsers and into in-memory DOM trees only as needed.
How does this work in Derby?

For 10.2 I am only working to add XML support to Derby in the SQL layer. I donot plan to address XML-specific JDBC processing. I'm planning to include inthe documentation something to this effect:


<begin doc>

There is no JDBC-side support for the XML datatype in Derby. This means it isnot possible to bind directly into an XML value nor to retrieve an XML valuedirectly from a result set using JDBC. Instead, users must bind and retrievethe XML data as Java strings or character streams by explicitly specifying theappropriate XML operators (XMLPARSE and XMLSERIALIZE) as part of their SQLqueries. This also means that there is no JDBC metatadata support for the XMLdatatype.


<end_doc>

Note that with respect to keeping large XML documents in memory, this isundoubtedly a place for improvement in the future. For 10.2, though, I'm justtaking some "baby steps" to introduce XML as a Derby SQL type and to enable itsusage. Thus far I've been working with a target XML size of up to about40k-ish, as a sort of "starter" for XML support in Derby. When Derby becomesserious about larger XML documents, more work is definitely going to be required.

Some questions that occur:
   a) If I have a large XML document in a file, how do I get that into my
      XML column in my database? Is it like a CLOB/BLOB where I work with
      some sort of a special stream class?

In order to store an XML value into a Derby database using JDBC, one mustexplicitly use the XMLPARSE operator in the SQL statement and then use any ofthe setXXXmethods that are compatible with String types. For example:


  insert into myXmlTable(xcol) values XMLPARSE(DOCUMENT ? PRESERVE WHITESPACE)

And then use setString/setCharacterStream to bind the operator.

   b) The mirror-image question is how do I fetch a large XML document from
      my table and stream it to my file on my client efficiently?

In order to retrieve XML values from a Derby database using JDBC, one mustexplicitly use the XMLSERIALIZE operator in the SQL query and then use thegetXXX method that corresponds to the target serialization type. For example:

Query:

        select XMLSERIALIZE(xcol as CLOB) from myXmlTable

Then to retrieve the XML value, one would use any of the getXXXmethods that workfor CLOB.

   c) Internally, does the store use CLOB/BLOB techniques for XML storage?
      does it store them in separate files?

The XML datatype internally wraps a SQLChar value, so any writing/reading fromstore occurs via calls to the corresponding SQLChar methods. For example, seeXML.writeExternal(), XML.setStream(), etc. There is currently no special XMLlogic with respect to store.

   d) how does DRDA tranmit XML over the net? Is it externalized data?

The answer to this one is "I don't know". And the reason I think we can getaway with this answer is that, as mentioned above, I've only implemented XML atthe SQL side. Thus XML data that flows from client to server can only flow aspart of the SQL/XML operators, namely XMLPARSE and XMLSERIALIZE. Parameters tothese operators are always string values of an existing SQL type (CHAR, VARCHAR,CLOB) and thus those are the types that are sent by DRDA. We never transmit"XML" per se across the network.

Note that this is one of the big reasons why I decided to put off any JDBC sidesupport for now: the issue of sending "XML" across the wire without a DRDA XMLtype was not one I felt like addressing in the short term.

   Obviously, these questions are motivated by some of the work that
   Tomohito Nakayama and others have been doing recently with BLOB/CLOB
   efficiency, for example DERBY-326 and DERBY-550.

Understood--thanks for bringing them up. Please let me know if you have anyother questions about how this is going to work.

7) Another user-level question: in your test programs, your XML documents
   tend to be quite simple.


<snip different kinds of XML items>

   Presumably, since all of this is handled by the parser, "it just works".

Right :) If the JAXP parser that is in the user's classpath can handle it, thenthe XMLPARSE operand should be able to handle it, as well. If the XML documentrelies on external files, such as external schema documents, then those filesmust be accessible to the JAXP parser that Derby is using. Which brings us tothe next question...

   However, I'm a little confused about how the parsing happens in a
   client-server scenario: is the XMLPARSE processing performed on the
   client side? Or on the server side? I think this only becomes relevant
   when the user must do something to ensure that the XML parser and the
   XPATH/XQUERY engines are configured properly; they need to know which
   "side" (client/server) of their environment needs to be so configured.

Since all XML operations occur through SQL, all XML processing occurs where theSQL processing occurs--i.e. on the server. The client is just sending a SQLstatement (as a string) to the server and then sending the XML content as astring (or stream) as well. The actual execution of XMLPARSE is performed onthe server, and thus it is the server that must have the XML parser (JAXP) andthe XML query engine (Xalan) in its classpath.

8) We need to make sure that the documentation clearly specifies whichversions of the add-on XML software (parsers, XPATH, etc.) are specified,and we need to do our best to make the error messages when a bad version isused clear and specific.

Agreed. Phase 4 of the DERBY-688 changes is almost complete, and that's thephase in which I check for the required XML classes and throw a user-friendlyerror if the classes are not found.

I have not yet looked at checking for specific versions of the software, and I'mnot exactly sure how that should work. If, for example, we say that we requireXalan 2.5 because of the bug described in XALANJ-1643, then what happens if/whensome other bug in Xalan 2.5.xxx (or whatever) pops up? Do we now require thatXalan 2.5.xxx be installed? Do we want to code the required version into theerror message? Do we want to disallow XML operations if the required versionisn't present? Or just a warning? What happens if some app is using 10.2.2which requires version 2.5 and then upgrades to version 10.2.3 which requiresversions 2.6--are we going to force the app to upgrade its Xalan release, aswell? I don't know what the policies around this kind of thing are, but thatseems a bit unfriendly...

9) When I run lang/xmlBinding.java, I see the following diff. This diffoccurs in all three configurations I tried (embedded, DerbyNet, andDerbyNetClient)

Thanks for pointing this out; my guess is that it's a test issue, but I willlook into it. This seems like something that could be addressed in follow-uppatch--perhaps the one in which I enable the tests to run as part of derbyall?

Many thanks for taking the time to review the changes, Bryan, and for theexcellent questions. If you have any more, please do ask!

Otherwise, if any of my answers above would make you uncomfortable withcommitting the patches (or with approving their commit), please let me know andI will try to address your concerns.


Thanks again (and again!) for the review,
Army

Re: DERBY-688: Some review comments and feedback

Reply via email to