Re: [PERFORM] Sort performance on large tables

Marc Morin Tue, 08 Nov 2005 08:35:09 -0800

I have run into this type of query problem as well.  I solved it in my
application by the following type of query.


SELECT tlid
FROM completechain AS o
WHERE not exists ( 
        SELECT 1
        FROM completechain
        WHERE tlid=o.tlid and ogc_fid!=o.ogc_fid
);

Assumes of course that you have an index on tlid.

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Charlie Savage
> Sent: Tuesday, November 08, 2005 2:05 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Sort performance on large tables
> 
> Hi everyone,
> 
> I have a question about the performance of sort.
> 
> Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 
> RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate 
> 400GB drive.  The IDE drive has the OS and the WAL files, the 
> SATA drive the database. 
>  From hdparm the max IO for the IDE drive is about 50Mb/s and 
> the SATA drive is about 65Mb/s.  Thus a very low-end machine 
> - but it used just for development (i.e., it is not a 
> production machine) and the only thing it does is run a 
> PostgresSQL database.
> 
> I have a staging table called completechain that holds US 
> tiger data (i.e., streets and addresses for the US).  The 
> table is approximately 18GB.  Its big because there is a lot 
> of data, but also because the table is not normalized (it 
> comes that way).
> 
> I want to extract data out of the file, with the most 
> important values being stored in a column called tlid.  The 
> tlid field is an integer, and the values are 98% unique.  
> There is a second column called ogc_fid which is unique (it 
> is a serial field).  I need to extract out unique TLID's 
> (doesn't matter which duplicate I get rid of).  To do this I 
> am running this query:
> 
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid;
> 
> The results from explain analyze are:
> 
> "GroupAggregate  (cost=10400373.80..11361807.88 rows=48071704 
> width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)"
> "  ->  Sort  (cost=10400373.80..10520553.06 rows=48071704 
> width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)"
> "        Sort Key: tlid"
> "        ->  Seq Scan on completechain  (cost=0.00..2228584.04 
> rows=48071704 width=8) (actual time=27.514..773245.046 
> rows=48199165 loops=1)"
> "Total runtime: 8486057.185 ms"
>       
> Doing a similar query produces the same results:
> 
> SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain;
> 
> Note it takes over 10 times longer to do the sort than the 
> full sequential scan.
> 
> Should I expect results like this?  I realize that the 
> computer is quite low-end and is very IO bound for this 
> query, but I'm still surprised that the sort operation takes so long.
> 
> Out of curiosity, I setup an Oracle database on the same 
> machine with the same data and ran the same query.  Oracle 
> was over an order of magnitude faster. Looking at its query 
> plan, it avoided the sort by using "HASH GROUP BY."  Does 
> such a construct exist in PostgreSQL (I see only hash joins)?
> 
> Also as an experiment I forced oracle to do a sort by running 
> this query:
> 
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid
> ORDER BY tlid;
> 
> Even with this, it was more than a magnitude faster than Postgresql. 
> Which makes me think I have somehow misconfigured postgresql 
> (see the relevant parts of postgresql.conf below).
> 
> Any idea/help appreciated.
> 
> Thanks,
> 
> Charlie
> 
> 
> -------------------------------
> 
> #-------------------------------------------------------------
> --------------
> # RESOURCE USAGE (except WAL)
> #-------------------------------------------------------------
> --------------
> 
> shared_buffers = 40000                  # 40000 buffers * 8192 
> bytes/buffer = 327,680,000 bytes
> #shared_buffers = 1000                        # min 16 or 
> max_connections*2, 8KB each
> 
> temp_buffers = 5000
> #temp_buffers = 1000                  # min 100, 8KB each
> #max_prepared_transactions = 5                # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes 
> of shared memory # per transaction slot, plus lock space (see 
> max_locks_per_transaction).
> 
> work_mem =  16384                        # in Kb
> #work_mem = 1024                      # min 64, size in KB
> 
> maintenance_work_mem = 262144            # in kb
> #maintenance_work_mem = 16384         # min 1024, size in KB
> #max_stack_depth = 2048                       # min 100, size in KB
> 
> # - Free Space Map -
> 
> max_fsm_pages = 60000 
> #max_fsm_pages = 20000                        # min 
> max_fsm_relations*16, 6 bytes each
> 
> #max_fsm_relations = 1000             # min 100, ~70 bytes each
> 
> # - Kernel Resource Usage -
> 
> #max_files_per_process = 1000         # min 25
> #preload_libraries = ''
> 
> # - Cost-Based Vacuum Delay -
> 
> #vacuum_cost_delay = 0                        # 0-1000 milliseconds
> #vacuum_cost_page_hit = 1             # 0-10000 credits
> #vacuum_cost_page_miss = 10           # 0-10000 credits
> #vacuum_cost_page_dirty = 20          # 0-10000 credits
> #vacuum_cost_limit = 200              # 0-10000 credits
> 
> # - Background writer -
> 
> #bgwriter_delay = 200                 # 10-10000 milliseconds 
> between rounds
> #bgwriter_lru_percent = 1.0           # 0-100% of LRU buffers 
> scanned/round
> #bgwriter_lru_maxpages = 5            # 0-1000 buffers max 
> written/round
> #bgwriter_all_percent = 0.333         # 0-100% of all buffers 
> scanned/round
> #bgwriter_all_maxpages = 5            # 0-1000 buffers max 
> written/round
> 
> 
> #-------------------------------------------------------------
> --------------
> # WRITE AHEAD LOG
> #-------------------------------------------------------------
> --------------
> 
> # - Settings -
> 
> fsync = on                            # turns forced 
> synchronization on or off
> #wal_sync_method = fsync              # the default is the 
> first option
>                                       # supported by the 
> operating system:
>                                       #   open_datasync
>                                       #   fdatasync
>                                       #   fsync
>                                       #   fsync_writethrough
>                                       #   open_sync
> #full_page_writes = on                        # recover from 
> partial page writes
> 
> wal_buffers = 128
> #wal_buffers = 8                      # min 4, 8KB each
> 
> #commit_delay = 0                     # range 0-100000, in 
> microseconds
> #commit_siblings = 5                  # range 1-1000
> 
> # - Checkpoints -
> 
> checkpoint_segments = 256               # 256 * 16Mb = 
> 4,294,967,296 bytes
> checkpoint_timeout = 1200             # 1200 seconds (20 minutes)
> checkpoint_warning = 30                       # in seconds, 0 is off
> 
> #checkpoint_segments = 3              # in logfile segments, 
> min 1, 16MB each
> #checkpoint_timeout = 300             # range 30-3600, in seconds
> #checkpoint_warning = 30              # in seconds, 0 is off
> 
> # - Archiving -
> 
> #archive_command = ''                 # command to use to 
> archive a logfile
>                                       # segment
> 
> 
> #-------------------------------------------------------------
> --------------
> # QUERY TUNING
> #-------------------------------------------------------------
> --------------
> 
> # - Planner Method Configuration -
> 
> #enable_bitmapscan = on
> #enable_hashagg = on
> #enable_hashjoin = on
> #enable_indexscan = on
> #enable_mergejoin = on
> #enable_nestloop = on
> #enable_seqscan = on
> #enable_sort = on
> #enable_tidscan = on
> 
> # - Planner Cost Constants -
> 
> effective_cache_size = 80000          # 80000 * 8192 = 
> 655,360,000 bytes
> #effective_cache_size = 1000          # typically 8KB each
> 
> random_page_cost = 2.5                        # units are one 
> sequential page fetch
> #random_page_cost = 4                 # units are one 
> sequential page fetch
>                                       # cost
> #cpu_tuple_cost = 0.01                        # (same)
> #cpu_index_tuple_cost = 0.001         # (same)
> #cpu_operator_cost = 0.0025           # (same)
> 
> # - Genetic Query Optimizer -
> 
> #geqo = on
> #geqo_threshold = 12
> #geqo_effort = 5                      # range 1-10
> #geqo_pool_size = 0                   # selects default based 
> on effort
> #geqo_generations = 0                 # selects default based 
> on effort
> #geqo_selection_bias = 2.0            # range 1.5-2.0
> 
> # - Other Planner Options -
> 
> default_statistics_target = 100               # range 1-1000
> #default_statistics_target = 10               # range 1-1000
> #constraint_exclusion = off
> #from_collapse_limit = 8
> #join_collapse_limit = 8              # 1 disables collapsing 
> of explicit
>                                       # JOINs
> 
> 
> #-------------------------------------------------------------
> --------------
> #-------------------------------------------------------------
> --------------
> # RUNTIME STATISTICS
> #-------------------------------------------------------------
> --------------
> 
> # - Statistics Monitoring -
> 
> #log_parser_stats = off
> #log_planner_stats = off
> #log_executor_stats = off
> #log_statement_stats = off
> 
> # - Query/Index Statistics Collector -
> 
> stats_start_collector = on
> stats_command_string = on
> stats_block_level = on
> stats_row_level = on
> 
> #stats_start_collector = on
> #stats_command_string = off
> #stats_block_level = off
> #stats_row_level = off
> #stats_reset_on_server_start = off
> 
> 
> #-------------------------------------------------------------
> --------------
> # AUTOVACUUM PARAMETERS
> #-------------------------------------------------------------
> --------------
> 
> autovacuum = true
> autovacuum_naptime = 600
> 
> #autovacuum = false                   # enable autovacuum subprocess?
> #autovacuum_naptime = 60              # time between 
> autovacuum runs, in secs
> #autovacuum_vacuum_threshold = 1000   # min # of tuple updates before
>                                       # vacuum
> #autovacuum_analyze_threshold = 500   # min # of tuple updates before
>                                       # analyze
> #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before
>                                       # vacuum
> #autovacuum_analyze_scale_factor = 0.2        # fraction of 
> rel size before
>                                       # analyze
> #autovacuum_vacuum_cost_delay = -1    # default vacuum cost delay for
>                                       # autovac, -1 means use
>                                       # vacuum_cost_delay
> #autovacuum_vacuum_cost_limit = -1    # default vacuum cost limit for
>                                       # autovac, -1 means use
>                                       # vacuum_cost_
> 
> 
> ----------------------
> 
> CREATE TABLE tiger.completechain
> (
>    ogc_fid int4 NOT NULL DEFAULT
> nextval('completechain_ogc_fid_seq'::regclass),
>    module varchar(8) NOT NULL,
>    tlid int4 NOT NULL,
>    side1 int4,
>    source varchar(1) NOT NULL,
>    fedirp varchar(2),
>    fename varchar(30),
>    fetype varchar(4),
>    fedirs varchar(2),
>    cfcc varchar(3) NOT NULL,
>    fraddl varchar(11),
>    toaddl varchar(11),
>    fraddr varchar(11),
>    toaddr varchar(11),
>    friaddl varchar(1),
>    toiaddl varchar(1),
>    friaddr varchar(1),
>    toiaddr varchar(1),
>    zipl int4,
>    zipr int4,
>    aianhhfpl int4,
>    aianhhfpr int4,
>    aihhtlil varchar(1),
>    aihhtlir varchar(1),
>    census1 varchar(1),
>    census2 varchar(1),
>    statel int4,
>    stater int4,
>    countyl int4,
>    countyr int4,
>    cousubl int4,
>    cousubr int4,
>    submcdl int4,
>    submcdr int4,
>    placel int4,
>    placer int4,
>    tractl int4,
>    tractr int4,
>    blockl int4,
>    blockr int4,
>    wkb_geometry public.geometry NOT NULL,
>    CONSTRAINT enforce_dims_wkb_geometry CHECK 
> (ndims(wkb_geometry) = 2),
>    CONSTRAINT enforce_geotype_wkb_geometry CHECK
> (geometrytype(wkb_geometry) = 'LINESTRING'::text OR 
> wkb_geometry IS NULL),
>    CONSTRAINT enforce_srid_wkb_geometry CHECK 
> (srid(wkb_geometry) = 4269)
> )
> WITHOUT OIDS;
> ALTER TABLE tiger.completechain OWNER TO postgres;
> 
> 
> 
> 
> 
> 
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 
> 

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Sort performance on large tables

Reply via email to