On Thu, Jun 11, 2015 at 1:23 PM, Peter Geoghegan <p...@heroku.com> wrote:
> I'm not sure if it's worth mentioning the "cheap equality for text"
> commit (e246b3d6eac09). I guess that it probably is, because it will
> help with things like index scans, too. Arguably that isn't a sorting
> thing (it's certainly not *just* a sorting thing).

I think we should really address this. Attached patch adds a new
release note item for it. It also adds to the documentation that
explains why users should prefer varchar(n)/text to character(n); the
lack of abbreviated key support now becomes a huge disadvantage for
character(n), whereas in previous versions the disadvantages were
fairly minor.

In passing, I updated the existing sort item to reflect that only
varchar(n), text, and numeric benefit from the abbreviation
optimization (not character types more generally + numeric), and added
a note on the effectiveness of the abbreviation optimization alone.

-- 
Peter Geoghegan
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 9d5ce95..bef9c6c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1106,7 +1106,7 @@ SELECT '52093.89'::money::numeric::float8;
 
    <tip>
     <para>
-     There is no performance difference among these three types,
+     There is generally no performance difference among these three types,
      apart from increased storage space when using the blank-padded
      type, and a few extra CPU cycles to check the length when storing into
      a length-constrained column.  While
@@ -1114,9 +1114,11 @@ SELECT '52093.89'::money::numeric::float8;
      advantages in some other database systems, there is no such advantage in
      <productname>PostgreSQL</productname>; in fact
      <type>character(<replaceable>n</>)</type> is usually the slowest of
-     the three because of its additional storage costs.  In most situations
-     <type>text</type> or <type>character varying</type> should be used
-     instead.
+     the three because of its additional storage costs, and
+     particularly because presently it does not take advantage of an
+     important optimization used when sorting.  In most situations
+     <type>text</type> or <type>character varying</type> should be
+     used instead.
     </para>
    </tip>
 
diff --git a/doc/src/sgml/release-9.5.sgml b/doc/src/sgml/release-9.5.sgml
index 279fb11..9c97b9b 100644
--- a/doc/src/sgml/release-9.5.sgml
+++ b/doc/src/sgml/release-9.5.sgml
@@ -214,8 +214,32 @@
 
       <listitem>
        <para>
-        Improve the speed of sorting character and numeric fields (Peter
-        Geoghegan, Andrew Gierth, Robert Haas)
+        Improve the speed of sorting
+        <type>varchar(<replaceable>n</>)</type>, <type>text</type>,
+        and <type>numeric</type> fields (Peter Geoghegan, Andrew
+        Gierth, Robert Haas)
+       </para>
+
+       <para>
+        When there are multiple fields, the optimization is most
+        effective when most comparisons are resolved by comparing only
+        the first field.
+       </para>
+      </listitem>
+
+      <listitem>
+       <para>
+        Perform inexpensive binary equality pre-check for
+        <type>varchar(<replaceable>n</>)</type> and <type>text</type>
+        comparisons (Peter Geoghegan)
+       </para>
+
+       <para>
+        This improves the speed of operations where many
+        comparisons can be resolved with the pre-check, which may
+        include sort operations, merge joins, and index scans.
+        Multi-field sort operations with leading fields that have
+        relatively few distinct values will particularly benefit.
        </para>
       </listitem>
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to