ERROR XSDG0: Page Page(1325564,Container(0, 30832)) could not be read from disk. Caused by: java.io.EOFException: Reached end of file while attempting to read a whole page.
Does the derby.log have any more detail about this specific exception? Note that you can use the system tables (SYSCONGLOMERATES, I believe) to figure out which table corresponds to conglomerate 30832, and you can also multiply 1325564 by the pagesize of your table to figure out what the file size was at the instant that this happened. Assuming your page size was 4096, 1325564 * 4096 is 5,429,510,144, so that conglomerate should be about 5.4 GB in size.
derby the reported errors like: org.apache.derby.iapi.error.ShutdownException:
This is normal I believe.
java.lang.NullPointerException at org.apache.derby.impl.drda.DRDAConnThread.writePBSD(Unknown Source) at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source) at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
This is scary, but it appears to have happened AFTER the shutdown, and hence may be some secondary, unrelated bug in the network server code related to not handling a shutdown correctly. It seems worth investigating separately.
The system is an Oracle M5000 Enterprise server with what I believe is a 15TB Sun ZFS Storage 7320 external ZFS storage array connected by Fibre Channel. This is the first time in over 8 years we have seen any I/O error like such. What I am trying to confirm is that this is really low level derby code that if it reports an “java.io.EOFException” like it did, it really did have an I/O error somewhere in reading the page from the container file. Things like performance, java heap space, etc, can pretty much be ruled out as causing such an error. My gut feeling is that maybe something in the connection to this storage array had a hiccup. This setup is at the customer site and I cannot directly access system logs nor do I have knowledge on how this storage array works and how to look at such but just having confirmation that an I/O error really did occur would help.
This is good information to have. My feeling is that you should do a more thorough investigation of the specific conglomerate in question, to check for errors that might not be showing up using your regular application access patterns. Also, if you can find any more information in the derby log, it would be nice to know. Thanks for sharing the information that you do have, it is quite interesting to know what your experience is! bryan P.S. I believe this is the code that threw the java.io.EOFException: /** * Attempts to fill buf completely from start until it's full. * <p/> * FileChannel has no readFull() method, so we roll our own. * <p/> * @param dstBuffer buffer to read into * @param srcChannel channel to read from * @param position file position from where to read * * @throws IOException if an I/O error occurs while reading * @throws StandardException If thread is interrupted. */ private void readFull(ByteBuffer dstBuffer, FileChannel srcChannel, long position) throws IOException, StandardException { while(dstBuffer.remaining() > 0) { if (srcChannel.read(dstBuffer, position + dstBuffer.position()) == -1) { throw new EOFException( "Reached end of file while attempting to read a " + "whole page."); } // (**) Sun Java NIO is weird: it can close the channel due to an // interrupt without throwing if bytes got transferred. Compensate, // so we can clean up. Bug 6979009, // http://bugs.sun.com/view_bug.do?bug_id=6979009 if (Thread.currentThread().isInterrupted() && !srcChannel.isOpen()) { throw new ClosedByInterruptException(); } } }