Author: cutting
Date: Fri Sep  4 16:39:51 2009
New Revision: 811481

URL: http://svn.apache.org/viewvc?rev=811481&view=rev
Log:
AVRO-111.  Document sort ordering in specification.

Modified:
    hadoop/avro/trunk/CHANGES.txt
    hadoop/avro/trunk/src/doc/content/xdocs/spec.xml

Modified: hadoop/avro/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/hadoop/avro/trunk/CHANGES.txt?rev=811481&r1=811480&r2=811481&view=diff
==============================================================================
--- hadoop/avro/trunk/CHANGES.txt (original)
+++ hadoop/avro/trunk/CHANGES.txt Fri Sep  4 16:39:51 2009
@@ -47,6 +47,8 @@
     possible values are "increasing" (the default), "decreasing", and
     "ignore".  (cutting)
 
+    AVRO-111.  Document sort ordering in the specification. (cutting)
+
   IMPROVEMENTS
 
     AVRO-71.  C++: make deserializer more generic.  (Scott Banachowski

Modified: hadoop/avro/trunk/src/doc/content/xdocs/spec.xml
URL: 
http://svn.apache.org/viewvc/hadoop/avro/trunk/src/doc/content/xdocs/spec.xml?rev=811481&r1=811480&r2=811481&view=diff
==============================================================================
--- hadoop/avro/trunk/src/doc/content/xdocs/spec.xml (original)
+++ hadoop/avro/trunk/src/doc/content/xdocs/spec.xml Fri Sep  4 16:39:51 2009
@@ -116,6 +116,12 @@
                    <tr><td>fixed</td><td>string</td><td>"\u00ff"</td></tr>
                  </table>
                </li>
+               <li><code>order:</code> specifies how this field
+                 impacts sort ordering of this record (optional).
+                 Valid values are "ascending" (the default),
+                 "descending", or "ignore".  For more details on how
+                 this is used, see the the <a href="#order">sort
+                 order</a> section below.</li>
              </ul>
            </li>
          </ul>
@@ -474,6 +480,65 @@
 
     </section>
 
+    <section id="order">
+      <title>Sort Order</title>
+
+      <p>Avro defines a standard sort order for data.  This permits
+       data written by one system to be efficiently sorted by another
+       system.  This can be an important optimization, as sort order
+       comparisons are sometimes the most frequent per-object
+       operation.  Note also that Avro binary-encoded data can be
+       efficiently ordered without deserializing it to objects.</p>
+
+      <p>Data items may only be compared if they have identical
+       schemas.  Pairwise comparisons are implemented recursively
+       with a depth-first, left-to-right traversal of the schema.
+       The first mismatch encountered determines the order of the
+       items.</p>
+
+      <p>Two items with the same schema are compared according to the
+       following rules.</p>
+      <ul>
+       <li><code>int</code>, <code>long</code>, <code>float</code>
+         and <code>double</code> data is ordered by ascending numeric
+         value.</li>
+       <li><code>boolean</code> data is ordered with false before true.</li>
+       <li><code>null</code> data is always equal.</li>
+       <li><code>string</code> data is compared lexicographically.
+         Note that since UTF-8 is used as the binary encoding of
+         strings, sorting by bytes and characters is equivalent.</li>
+       <li><code>bytes</code> and <code>fixed</code> data are
+         compared lexicographically by byte.</li>
+       <li><code>array</code> data is compared lexicographically by
+         element.</li>
+       <li><code>enum</code> data is ordered by the symbol's position
+         in the enum schema.  For example, an enum whose symbols are
+         <code>["z", "a"]</code> would sort <code>"z"</code> values
+         before <code>"a"</code> values.</li>
+       <li><code>union</code> data is first ordered by the branch
+         within the union, and, within that, by the type of the
+         branch.  For example, an <code>["int", "string"]</code>
+         union would order all int values before all string values,
+         with the ints and strings themselves ordered as defined
+         above.</li>
+       <li><code>record</code> data is ordered lexicographically by
+         field.  If a field specifies that its order is:
+         <ul>
+           <li><code>"ascending"</code>, then the order of its values
+             is unaltered.</li>
+           <li><code>"ascending"</code>, then the order of its values
+             is reversed.</li>
+           <li><code>"ignore"</code>, then its values are ignored
+             when sorting.</li>
+         </ul>
+       </li>
+       <li><code>map</code> data may not be compared.  It is an error
+         to attempt to compare data containing maps unless those maps
+         are in an <code>"order":"ignore"</code> record field.
+       </li>
+      </ul>
+    </section>
+
     <section>
       <title>Object Container Files</title>
       <p>Avro includes a simple object container file format.  A file


Reply via email to