Jean,

Attached is an updated logformats.xml.

Thanks
<?xml version="1.0"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd";>
  <document> 
    <header> 
      <title>Derby Write Ahead Log Format</title>
      <abstract>This document describes the storage format of Derby Write Ahead 
        Log. This is a work-in-progress derived from Javadoc comments and from 
        explanations Mike Matrigali and others posted to the Derby lists. Please 
        post questions, comments, and corrections to [EMAIL PROTECTED] 
      </abstract>
    </header>
    <body> 
      <section id="introduction"> 
        <title> Introduction </title>
        <p> Derby uses a Write Ahead Log to record all changes to the database. 
          The Write Ahead Log (WAL) protocol requires the following rules to be 
          followed: </p>
        <ol> 
          <li>A page must be latched exclusively before it can be updated.</li>
          <li>While the latch is held, the update must be logged, and page must 
            be tagged with the identity of the log record (often known as Log 
            Sequence Number or LSN)</li>
          <li>When the page is about to be written to persistent storage, all 
            logs records up to and including the page's LSN, must be forced to 
            disk.</li>
          <li>Once the log records have been forced to disk, the cached page may 
            be written to persistent storage, overwriting the previous version 
            of the page.</li>
        </ol>
        <p>The WAL protocol ensures that in the event of a system crash, databases 
          pages can be restored to a consistent state using the information contained 
          in the log records. How this is done will be the subject of another 
          paper.</p>
      </section>
      <section> 
        <title> References </title>
        <p> A good description of Write Ahead Logging, and how a log is typically 
          implemented, can be found in 
          <em> 
            <a href="http://portal.acm.org/citation.cfm?id=573304";>Transaction 
              Processing: Concepts and Techniques</a>
            , by Jim Gray and Andreas Reuter, 1993, Morgan Kaufmann Publishers</em>
          .</p>
      </section>
      <section> 
        <title>Derby implementation of the Write Ahead Log</title>
        <p> Derby implements the Write Ahead Log using a non-circular file system 
          file. Here are some comments about current implementation of recovery:</p>
        <p class="quote">
          <em>Suresh Thalamati</em><br/>
          Derby supports simple media recovery. It has support for full backup/restore 
          and very basic form of rollforward recovery (replay of logs using backup 
          and archived log files). </p>
        <p class="quote">
          <em>Mike Matrigali</em><br/>
            1. Derby fully supports crash recovery, it uses java to correctly 
              sync the log file to support this.<br/>
            2. I would say derby supports media recovery. One can make a backup 
              of the data and store it off line. Logs can be stored on a separate 
              disk from the data, and if you lose your data disk then you can 
              use rollforward recovery on the existing logs and the copy of the 
              backup to bring your database up to the current point in time.<br/>
            3. Derby does not support "point in time recovery". Someone may want 
              to look at this in the future. Technically I don't think it would 
              be very hard as the logging system has the stuff to solve the hard 
              problems. It does not have an idea about "time" - it just knows 
              log sequence numbers, so need to figure out what kind of interface 
              a user really wants. A very user unfriendly interface would not 
              be very hard to implement which would be recover to a specific log 
              sequence number. Anyone interested in this feature should add it 
              to jira - I'll be happy to add technical comments on what needs 
              to be done.<br/>
            4. A reasonable next step in derby recovery progress would be to 
              add a way to automatically move/copy log files offline as they are 
              not needed by crash recovery and only needed for media recovery. 
              Some sort of java stored procedure callout would seem most appropriate.
        </p>
        <p> The 'log' is a stream of log records. The 'log' is implemented as 
          a series of numbered log files. These numbered log files are logically 
          continuous so a transaction can have log records that span multiple 
          log files. A single log record cannot span more than one log file. The 
          log file number is monotonically increasing. </p>
        <p> The log belongs to a log factory of a RawStore. In the current implementation, 
          each RawStore only has one log factory, so each RawStore only has one 
          log (which is composed of multiple log files). At any given time, a 
          log factory only writes new log records to one log file, this log file 
          is called the 'current log file'. </p>
        <p> A log file is named log 
          <em>logNumber</em>
          .dat </p>
        <!--
        <p> Everytime a checkpoint is taken, a new log file is created and all 
          subsequent log records will go to the new log file. After a checkpoint 
          is taken, old and useless log files will be deleted. </p>
-->
        <p>With the default values, a new log file is created (this is known as 
          log switch) when a log file grows beyond 1MB and a checkpoint happens 
          when the amount of log written is 10MB or more from the last checkpoint.</p>
        <p> RawStore exposes a checkpoint method which clients can call, or a 
          checkpoint is taken automatically by the RawStore when: </p>
        <ol> 
          <li> The log file grows beyond a certain size (configurable, default 
            1MB)</li>
          <li> RawStore is shutdown and a checkpoint hasn't been done "for a while"</li>
          <li> RawStore is recovered and a checkpoint hasn't been done "for a 
            while"</li>
        </ol>
        <section> 
          <title>LogCounter</title>
          <p>Log records are identified using LogCounter, which is an implementation 
            of LogInstant, a Derby term for LSN. The LogCounter is made up of 
            the log file number, and the byte offset of the log record within 
            the log file. Within the stored log record a log counter is represented 
            as a long. Outside the LogFactory the instant is passed around as 
            a LogCounter (through its LogInstant interface).</p>
          <p> The way the long is encoded is such that &lt; == &gt; correctly 
            tells if one log instant is lessThan, equals or greater than another.</p>
        </section>
      </section>
      <section> 
        <title> Format of Write Ahead Log </title>
        <p> An implementation of file based log is in 
          <code>org.apache.derby.impl.store.raw.log.LogToFile</code>. 
          This LogFactory is responsible for the formats of 2 kinds of file: 
          the log file and the log control file. And it is responsible for the 
          format of the log record wrapper. </p>
        <section> 
          <title>Format of Log Control File</title>
          <p>The log control file contains information about which log files are 
            present and where the last checkpoint log record is located.</p>
          <table> 
            <tr> 
              <th>Type</th>
              <th>Desciption</th>
            </tr>
            <tr> 
              <td>int</td>
              <td>format id set to FILE_STREAM_LOG_FILE</td>
            </tr>
            <tr> 
              <td>int</td>
              <td>obsolete log file version</td>
            </tr>
            <tr> 
              <td>long</td>
              <td>the log instant (LogCounter) of the last completed checkpoint</td>
            </tr>
            <tr> 
              <td>int</td>
              <td>JBMS (older name for Cloudscape/Derby) version</td>
            </tr>
            <tr> 
              <td>int</td>
              <td>checkpoint interval</td>
            </tr>
            <tr> 
              <td>long</td>
              <td>spare (value set to 0)</td>
            </tr>
            <tr> 
              <td>long</td>
              <td>spare (value set to 0)</td>
            </tr>
            <tr> 
              <td>long</td>
              <td>spare (value set to 0)</td>
            </tr>
          </table>
        </section>
        <section> 
          <title>Format of the log file</title>
          <p>The log file contains log records which record all the changes to 
            the database. The complete transaction log is composed of a series 
            of log files.</p>
          <table> 
            <tr> 
              <th>Type</th>
              <th>Description</th>
            </tr>
            <tr> 
              <td>int</td>
              <td>Format id of this log file, set to FILE_STREAM_LOG_FILE.</td>
            </tr>
            <tr> 
              <td>int</td>
              <td>Obsolete log file version - not used</td>
            </tr>
            <tr> 
              <td>long</td>
              <td>Log file number - this number orders the log files in a series 
                to form the complete transaction log </td>
            </tr>
            <tr> 
              <td>long</td>
              <td>PrevLogRecord - log instant of the previous log record, in the 
                previous log file.</td>
            </tr>
            <tr> 
              <td>[log record wrapper]*</td>
              <td>one or more log records with wrapper</td>
            </tr>
            <tr> 
              <td>int</td>
              <td>EndMarker - value of zero. The beginning of a log record wrapper 
                is the length of the log record, therefore it is never zero </td>
            </tr>
            <tr> 
              <td>[int fuzzy end]*</td>
              <td>zero or more int's of value 0, in case this log file has been 
                recovered and any incomplete log record set to zero. </td>
            </tr>
          </table>
        </section>
        <section> 
          <title>Format of the log record wrapper</title>
          <p>The log record wrapper provides information for the log scan.</p>
          <table> 
            <tr> 
              <th>Type</th>
              <th>Description</th>
            </tr>
            <tr> 
              <td>int</td>
              <td>length - length of the log record (for forward scan)</td>
            </tr>
            <tr> 
              <td>long</td>
              <td>instant - LogInstant of the log record</td>
            </tr>
            <tr> 
              <td>byte[length]</td>
              <td>logRecord - byte array that is written by the FileLogger</td>
            </tr>
            <tr> 
              <td>int</td>
              <td>length - length of the log record (for backward scan)</td>
            </tr>
          </table>
        </section>
        <section> 
          <title>The format of a log record</title>
          <p>The log record described every change to the persistent store</p>
          <table> 
            <tr> 
              <th>Type</th>
              <th>Description</th>
            </tr>
            <tr> 
              <td>int</td>
              <td>format_id, set to LOG_RECORD. The formatId is written by FormatIdOutputStream 
                when this object is written out by writeObject </td>
            </tr>
            <tr> 
              <td>CompressedInt</td>
              <td> <p>loggable group - the loggable's group value.</p> <p> Each 
                  loggable belongs to one or more groups of similar functionality. 
                </p> <p> Grouping is a way to quickly sort out log records that 
                  are interesting to different modules or different implementations. 
                </p> <p> When a module makes loggable and sent it to the log file, 
                  it must mark this loggable with one or more of the following 
                  group. If none fit, or if the loggable encompasses functionality 
                  that is not described in existing groups, then a new group should 
                  be introduced. </p> <p> Grouping has no effect on how the record 
                  is logged or how it is treated in rollback or recovery. </p> 
                <p> The following groups are defined. This list serves as the 
                  registry of all loggable groups. </p> <table> 
                  <caption>Loggable Groups</caption>
                  <tr> 
                    <th>Name</th>
                    <th>Value</th>
                    <th>Description</th>
                  </tr>
                  <tr> 
                    <td>FIRST</td>
                    <td>0x1</td>
                    <td>The first operation of a transaction.</td>
                  </tr>
                  <tr> 
                    <td>LAST</td>
                    <td>0x2</td>
                    <td>The last operation of a transaction.</td>
                  </tr>
                  <tr> 
                    <td>COMPENSATION</td>
                    <td>0x4</td>
                    <td>A compensation log record.</td>
                  </tr>
                  <tr> 
                    <td>BI_LOG</td>
                    <td>0x8</td>
                    <td>A BeforeImage log record.</td>
                  </tr>
                  <tr> 
                    <td>COMMIT</td>
                    <td>0x10</td>
                    <td>The transaction committed.</td>
                  </tr>
                  <tr> 
                    <td>ABORT</td>
                    <td>0x20</td>
                    <td>The transaction aborted.</td>
                  </tr>
                  <tr> 
                    <td>PREPARE</td>
                    <td>0x40</td>
                    <td>The transaction prepared.</td>
                  </tr>
                  <tr> 
                    <td>XA_NEEDLOCK</td>
                    <td>0x80</td>
                    <td>Need to reclaim locks associated with theis log record 
                      during XA prepared xact recovery.</td>
                  </tr>
                  <tr> 
                    <td>RAWSTORE</td>
                    <td>0x100</td>
                    <td>A log record generated by the raw store.</td>
                  </tr>
                  <tr> 
                    <td>FILE_RESOURCE</td>
                    <td>0x400</td>
                    <td>related to "non-transactional" files.</td>
                  </tr>
                </table> </td>
            </tr>
            <tr> 
              <td>TransactionId</td>
              <td>xactId - The Transaction this log belongs to.</td>
            </tr>
            <tr> 
              <td>Loggable</td>
              <td>op - the log operation</td>
            </tr>
          </table>
        </section>
      </section>
      <section>
         <title>Pointers to relevant classes</title>
         <fixme author="DM">This section should link to appropriate Javadoc documentation</fixme>
         <table>
         <tr>
             <th>Package</th>
             <th>Class</th>
             <th>Description</th>
         </tr>
         <tr>
             <td>org.apache.derby.iapi.store.raw.log</td>
             <td>LogFactory.java</td>
             <td>The java interface for logging system module.</td>
         </tr>
         <tr>
             <td>org.apache.derby.impl.store.raw.log</td>
             <td>LogToFile.java</td>
             <td>The implmentation of the LogFactory.java, also implementing Module,
                 this is the one with recovery code.</td>
         </tr>
         <tr>
             <td></td>
             <td>CheckpointOperation.java</td>
             <td>A Log Operation that represents a checkpoint.</td>
         </tr>
         <tr>
             <td></td>
             <td>FileLogger.java</td>
             <td>Deals with putting log records to disk. Writes log records to a log file as a stream
                (ie. log records added to the end of the file, no concept of pages).</td>
         </tr>
         <tr>
             <td></td>
             <td>FlushedScan.java</td>
             <td>Deals with scanning the log file. Scan the the log which is implemented by a series of log files.
                 This log scan knows how to move across log file if it is positioned at
                 the boundary of a log file and needs to getNextRecord.</td>
         </tr>
         <tr>
             <td></td>
             <td>FlushedScanHandle.java</td>
             <td>More stuff dealing with scanning the log file.</td>
         </tr>
         <tr>
              <td></td>
              <td>Scan.java</td>
              <td>More scan log file stuff. Scan the the log which is implemented by a series of log files.
                This log scan knows how to move across log file if it is positioned at
                the boundary of a log file and needs to getNextRecord.</td>
         </tr>
         <tr>
              <td></td>
              <td>StreamLogScan.java</td>
              <td>More scan log file stuff. LogScan provides methods to read a log record and get its LogInstant
                  in an already defined scan.</td>
         </tr>
         <tr>
             <td></td>
             <td>LogAccessFile.java</td>
             <td>Lowest level putting log records to disk. Wraps a RandomAccessFile file to provide buffering
                 on log writes.</td>
         </tr>
         <tr>
             <td></td>
             <td>LogAccessFileBuffer.java</td>
             <td>Utility for LogAccessFile. A single buffer of data.</td>
         </tr>
         <tr>
              <td></td>
              <td>LogCounter.java</td>
              <td>Log sequence number (LSN) implementation </td>
         </tr>
         <tr>
              <td></td>
              <td>LogRecord.java</td>
              <td>The log record written out to disk.</td>
         </tr>
         <tr>
              <td></td>
              <td>ReadOnly.java</td>
              <td>an alternate read only implementation of LogFactory</td>
         </tr>
         </table>
      </section>
    </body>
    <footer> 
      <legal></legal>
    </footer>
  </document>

Reply via email to