Re: [GENERAL] Does PostgreSQL ever create indexes on its own?

2015-11-13 Thread Jeremy Harris
On 13/11/15 10:49, Thomas Kellerer wrote:
>> These indexes were *not* created by PostgreSQL.
>> We are not Oracle.
> 
> Well, Oracle does not create indexes on its own either - it has the same 
> strategy as Postgres:
> Indexes are only created automatically for primary keys and unique 
> constraints.

Given that indices are an implementation wart on the side of the
relational model, it'd be nice if RDBMS' did create them for one.
-- 
Cheers,
  Jeremy




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Merge join vs merge semi join against primary key

2015-10-10 Thread Jeremy Harris
On 09/10/15 20:52, Sean Rhea wrote:
[...]
>->  Index Scan using customers_pkey on customers  (cost=0.00..64192.97
> rows=184 width=8) (actual time=103.354..126.459 rows=359 loops=1)
>  Filter: (group_id = 45)
>  Rows Removed by Filter: 141684
>  Total runtime: 146.659 ms
[...]
>->  Index Scan using customers_pkey on customers o  (cost=0.00..63836.61
> rows=836 width=8) (actual time=159.840..201915.765 rows=7672 loops=1)
>  Filter: (group_id = 45)
>  Rows Removed by Filter: 212699113
>  Total runtime: 201995.044 ms

Are you sure the customers table was the same?
-- 
Jeremy




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Some indexing advice for a Postgres newbie, please?

2015-02-25 Thread Jeremy Harris
On 25/02/15 15:42, Merlin Moncure wrote:
 On Sat, Feb 21, 2015 at 5:33 AM, Arjen Nienhuis a.g.nienh...@gmail.com 
 wrote:
 Hi,

 On 19 Feb 2015 17:12, brian br...@meadows.pair.com wrote:


 Hi folks,

 I have a single-user application which is growing beyond the
 fixed-format data files in which it currently holds its data, I need a
 proper database as the backend. The front end is written using Lazarus
 and FreePascal under Linux, should anyone feel that makes a
 difference. The database will need to grow to around 250,000 records.

 My problem is with the data field which is the (unique) key. It's
 really a single 192-bit integer (it holds various bits of bitmapped
 data) which I currently hold as six 32-bit integers, but can convert
 if needed when transferring the data.

 How would you advise that I hold this field in a Postgres database,
 given the requirement for the whole thing to be a unique key? The
 first 64 bits change relatively infrequently, the last 128 bits will
 change with virtually every record. The last 128 bits will ALMOST be
 unique in themselves, but not quite. :(

 Thanks,

 Brian.


 Postgres can use almost anything as a key so it probably depends on the
 library you use to access the database.

 If it supports composite primary keys you can use the 6 ints as a key:

 PRIMARY KEY (n1, n2, n3, n4, n5, n6)

 The numeric type can hold 192-bit numbers. I think Lazarus supports this as
 well.

 You could also use a surrogate key and define a UNIQUE constraint on the 6
 ints or the 192-bit number.
 
 You could also use 3 64 bit bigints if that's easier.  The other way
 to do it is bytea.

The other way to do it is to have semantically-meaningful columns
rather than glomming them into this 192-bit integer, and a composite
key on the lot - if the set truly is unique.
-- 
Jeremy




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem updating from form

2011-01-03 Thread Jeremy Harris

On 2011-01-03 06:29, Karen Springer wrote:

We are running RHEL 4.1 which is why the newer version did not install with 
RHEL.


RHEL 4.1 should be offering pgsql 8.1.15 in the apps channel
(Red Hat Application Stack v1).

- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] problem updating from form

2011-01-02 Thread Jeremy Harris

On 2011-01-02 08:31, Karen Springer wrote:

We are using PostgreSQL 8.1.4 on Red Hat, Microsoft Access 2002  
psqlodbc_09_00_0200.


You don't say which RedHat.
RHN offers 8.1.22 for RHEL5 currently.Are you not running regular updates?

- Jeremy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Jeremy Harris

On 2010-12-21 10:42, Massa, Harald Armin wrote:

b) creating an index requires to read the data-to-be-indexed. So, to have an
index pointing at the interesting rows for your query, the table has to be
read ... which would be the perfect time to allready select the interesting
rows. And after having the interesting rows: the index is worthless


... until another similar query comes along, when suddenly it's a massive win.
Why not auto-create indices for some limited period after database load
(copy?  any large number of inserts from a single connection?), track those
that actually get re-used and remove the rest?   Would this not provide
a better out-of-the-box experience for neophytes?

[...]

Why is the query planner not allowed to create indexes, but only allowed to

use or not use what's available?



as in b): Creating an index is quite expensiv


How much more so than doing that full-table-scan plus sort, which your
query is doing anyway?

Cheers,
Jeremy



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Jeremy Harris

On 2010-12-21 14:26, t...@fuzzy.cz wrote:

Why not auto-create indices for some limited period after database load
(copy?  any large number of inserts from a single connection?), track
those
that actually get re-used and remove the rest?   Would this not provide
a better out-of-the-box experience for neophytes?


Say you have a table with several columns (A,B,C), and the query is using
some of them. What indexes would you create? One index on every column? A
multi-column index on all columns? Indexs for each combination of columns?


One single-column index, on the first index-worthy column appearing.
Keep it simple.Maybe, while you're doing that full-table-scan. gather
stats on all the indexable columns for later reference, to guide choice of
which column to index later.


There really is no automatic way to solve this puzzle using a single
query. Indexing strategy is a very tough design discipline, and it
requires a complex knowledge of the workload. One slow query does not mean
the index should be created - what if that was just an ad-hoc query and
will not be executed ever again?


Then the index you just built gets automatically dropped, as I said above.


I really don't want to work with products that try to be smarter than me
(and I admit I'm pretty dumb from time to time) and act rather randomly
due to this 'artificial intelligence'. I've already been there and I don't
want to repeat this experience.


Then, since you're not a neophyte, leave the feature turned off.   But don't
deny the possibility of using it to actual neophytes.



as in b): Creating an index is quite expensiv


How much more so than doing that full-table-scan plus sort, which your
query is doing anyway?


A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
maintenance_work_mem) etc. So imagine a few users, building indices on a
big table simultaneously


Pffthht.   One simple trylock, used only by the auto-indexer.



Building an index is just one side of the problem - maintenance of the
indexes is another thing. Each index has an impact on write operations
(INSERT/UPDATE) and may cause that HOT actually does not work (thus
causing unnecessary bloat of the index).


This is worth discussing.How much help does the DBMS currently give
the DBA in evaluating these tradeoffs?Could we do better, given an
actual installation and workload?


And the indexes may need lot of
space on a disk.


By all means require limits as well as a don't do that switch.



But the real show stopper is probably locking. Building an index takes a
write lock on a table, effectively blocking writes. Sure, you can use a
'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
disadvantages of that (see
http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY).


The only disadvantages I see there are a) the leftover invalid index - which 
feels like a
bug; why is it not auto-dropped?   and b) the second scan, which implies more 
total work
and a wish to background that portion after completing the query triggering
the auto-index.

Don't forget I suggested doing this only for a limited time after DB creation.  
Possibly another
reason to turn it off should be any manual index creation, as that implies that 
the DBA
knows about indexing.

I see in another thread you suggest merely placing hints in the log.   That's a 
fine
first step - but I'll then be wanting to auto-parse that log to auto-create

Cheers,
Jeremy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Can the query planner create indexes?

2010-12-21 Thread Jeremy Harris

On 2010-12-21 18:50, Tomas Vondra wrote:

Then the index you just built gets automatically dropped, as I said above.


I'm a bit confused. Should the indexes be dropped automatically (as you
state here) or kept for the future. Because if they should be dropped,
then it does not make sense to do this magic just for a limited time
after the DB goes live.


Here what I said:
track those that actually get re-used and remove the rest.

Which part is confusing?

- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] join two tables without a key

2010-04-03 Thread Jeremy Harris

On 04/03/2010 11:16 AM, Dino Vliet wrote:

If I have two tables with the same number of rows but different columns and I 
want to create one table out of them what would be the way to do that in 
postgresql?



Table A has N number of rows and columns X,Y,Z and Table B has N number of rows 
and P,Q,R as columns. None of the tables have a column which can be used as a 
key.

The resulting table should have N number of rows and X,Y,Z,P,Q,R as columns.


You haven't said what you want the result to mean.

- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Rapid Seek Devices (feature request)

2009-08-17 Thread Jeremy Harris

On 08/17/2009 03:24 AM, Craig Ringer wrote:

On 16/08/2009 9:06 PM, NTPT wrote:

So I suggest we should have random_page_cost and
Sequential_page_cost configurable on per tablespace basis.


That strikes me as a REALLY good idea, personally, though I don't know
enough about the planner to factor in implementation practicalities and
any cost for people _not_ using the feature.


Could not pgsql *measure* these costs (on a sampling basis, and with long
time-constants)?

- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] unexpected check constraint violation

2009-03-23 Thread Jeremy Harris

Jacek Becla wrote:

create table t(d real, check(d=0.00603));
insert into t values (0.00603);

ERROR:  new row for relation t violates check constraint t_d_check


Because equality is not well-defined for real values?

- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] I don't want to back up index files

2009-03-13 Thread Jeremy Harris

Simon Riggs wrote:

The idea of auto rebuilding indexes following recovery has already been
proposed, so is under consideration. It hasn't been proposed in relation
to the use case you mention, so that is new.

If we did as you suggest then it would speed up the base backup but
would also add index rebuild time onto the end of any recovery.


Forgive me for being stupid, but isn't a time when all the data for
a table is being streamed in, during restore, the *perfect* time
to build an index?   Why wait until after the restore?

Cheers,
   Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Pet Peeves?

2009-02-03 Thread Jeremy Harris

Gregory Stark wrote:

So, what do people say? Is Postgres perfect in your world or does it do some
things which rub you the wrong way?


As a further take on the auto-tuning others have mentioned,
how about some auto-indexing?

- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Timestamp shift when importing data

2009-01-03 Thread Jeremy Harris

Jolles, Peter M (GE Infra, Energy) wrote:

I am trying to migrate several years of historical data with timestamps
from an MS Access database to Postgres. I am running into an issue where
specific dates/times get pushed one hour ahead, which creates duplicate
date/time stamps or failes the import if I have that defined as my
primary key. The time that gets shifted is always 2:00 AM to 2:55 AM
(data is in 5 minute blocks). What I don't understand is that it only
seems to happen on the following dates (m/d/yy format):
 
4/7/02

4/6/03
4/4/04
4/3/05
4/2/06


Daylight savings time shift?
- Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] when to reindex?

2008-06-07 Thread Jeremy Harris

Gregory Stark wrote:

 REINDEX scans the table
precisely once and sorts it.


For the bloat, as opposed to corruption, case -
what information is needed from the table that
is not in the old index?  Why would a sequential
read of the old index alone (then some processing)
not suffice?

Thanks,
  Jeremy Harris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] inheritance. more.

2008-05-01 Thread Jeremy Harris

Gurjeet Singh wrote:

 One of the advantages
of breaking up your data into partitions, as professed by Simon (I think)
(and I agree), is that you have smaller indexes, which improve performance.
And maybe having one huge index managing the uniqueness across partitioned
data just defeats the idea of data partitioning!


Isn't large indexes are a performance problem just saying
we don't implement indexes very well?   And why are they
a problem - surely a tree-structured index is giving you
range-partitioned subsets as you traverse it?  Why is this
different from manual partitioning into (inherited) tables?

Thanks,
Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] inheritance. more.

2008-05-01 Thread Jeremy Harris

Nathan Boley wrote:

Because people can be smarter about the data partitioning.

Consider a table of users. Some are active, most are not. The active
users account for nearly all of the users table access, but I still
(occasionally) want to access info about the inactive users.
Partitioning users into active_users and inactive_users allows me to
tell the database (indirectly) that the active users index should stay
in memory, while the inactive users can relegated to disk.

-Nathan

On Thu, May 1, 2008 at 6:02 AM, Jeremy Harris [EMAIL PROTECTED] wrote:

Gurjeet Singh wrote:


 One of the advantages
of breaking up your data into partitions, as professed by Simon (I think)
(and I agree), is that you have smaller indexes, which improve

performance.

And maybe having one huge index managing the uniqueness across partitioned
data just defeats the idea of data partitioning!


 Isn't large indexes are a performance problem just saying
 we don't implement indexes very well?   And why are they
 a problem - surely a tree-structured index is giving you
 range-partitioned subsets as you traverse it?  Why is this
 different from manual partitioning into (inherited) tables?


Agreed, data placement is one reason for partitioning.  But won't
this happen automatically?  Won't, in your example, the active
part of a one-large-index stay in memory while the inactive parts
get pushed out?

Cheers,
  Jeremy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] enabling autovacuum

2008-01-30 Thread Jeremy Harris

Chander Ganesan wrote:

Jeremy Harris wrote:

Version:
 PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 
4.1.2 20070418 (Red Hat 4.1.2-10)


We have one problematic table, which has a steady stream of entries
and a weekly mass-delete of ancient history.  The bloat query from
Greg Sabino Mullane (thanks to Greg Smith for pointing it out) returns:
Inserts don't generate dead tuples, and AVD looks at obsolete tuples..  
As such,  I wouldn't expect AVD to kick off until after you did a mass 
delete...assuming that delete was sizable enough to trigger a vacuum.


Ah, that would explain it - thankyou.   So I need to retreat to
the question of why the weekly vacuum permits the observed bloat.
Any ideas?   More information that I could gather?

Thanks,
  Jeremy

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] enabling autovacuum

2008-01-28 Thread Jeremy Harris

Hi,

We're starting to run autovacuum for the first time on a system
that's been running with nightly cron-driven vacuum for some time.

Version:
 PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 
20070418 (Red Hat 4.1.2-10)

We have one problematic table, which has a steady stream of entries
and a weekly mass-delete of ancient history.  The bloat query from
Greg Sabino Mullane (thanks to Greg Smith for pointing it out) returns:

schemaname | tablename  | reltuples | relpages | otta | tbloat | wastedpages | wastedbytes | wastedsize |iname| ituples | ipages | iotta | ibloat | wastedipages | wastedibytes | wastedisize 
++---+--+--++-+-++-+-++---++--+--+-

public | rcpt_audit |   1300300 |   152149 | 6365 |   23.9 |  145784 |  
1194262528 | 1139 MB| rcpt_audit_msg_audit_id_idx | 1300300 |   6798 |  
3819 |1.8 | 2979 | 24403968 | 23 MB
public | rcpt_audit |   1300300 |   152149 | 6365 |   23.9 |  145784 |  
1194262528 | 1139 MB| rcpt_audit_id_idx   | 1300300 |   4727 |  
3819 |1.2 |  908 |  7438336 | 7264 kB


The table description is:
id   | integer | not null default nextval(('rcpt_audit_id_seq'::text)::regclass) | 
msg_audit_id | integer |   | 
mailuser_id  | integer |   | 
username | text|   | 
domain   | text|   | 
copies   | integer |   | 
end_msg_size | integer |   | 
disp_type| integer |   | 
disp_id  | integer |   | 
disp_action  | text|   | 
disposition  | text|   | 
hdrs | text|   | 



We have uncommented autovacuum = on in postgresql.conf and run
service postgresql reload.  pg_stat_all_tables shows 4 tables
as autoanalyzed at about that time; 3 of which were also
autovacuumed.  The problem table is not included; no other autos
are logged there in the succeeding 24 hours.  
 Is other action needed to enable autovacuum?



The autovacuum tuning parameters are all at default settings.
We have
max_fsm_pages = 200
max_fsm_relations = 10

Are there any other changes we should make to stop this table
getting so bloated?


Thanks,
  Jeremy

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] enabling autovacuum

2008-01-28 Thread Jeremy Harris

Christopher Browne wrote:

Is it possible that this table didn't see many updates, today?


Nope; about 24000 (according to the id sequence).
- Jeremy

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] Linux v.s. Mac OS-X Performance (now Solaris)

2007-11-29 Thread Jeremy Harris

 Only under Solaris.  With Linux or BSD on it it ran pretty well.  I
 had a Sparc 20 running RH 7.2 back in the day (or whatever the last
 version of RH that ran on sparc was) that spanked an Ultra-2 running
 slowalrus with twice the memory and hard drives handily.
 
 Solaris has gotten much better since then, I'm sure.


	Ubuntu is supposed to be able to spin on a T1000/T2000 and they have 
come out with a magical beast called Solaris 10 and in Sun's infinite 
wisdom they have decided to abandon the /etc/init.d/ and friends way of 
startup for some complex XML way of doing things. But otherwise its 
quite good (ZFS and Cool Thread servers being among the other good 
things out of Sun's shop).


Anybody here running postgresql on a T1000?  What OS, and how is it?

Cheers,
   Jeremy


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] why postgresql over other RDBMS

2007-06-02 Thread Jeremy Harris

On 06/01/07 11:22, Bruce Momjian wrote:

 PFC wrote:

 On Thu, 31 May 2007 22:20:09 +0200, Vivek Khera [EMAIL PROTECTED] wrote:


 On May 25, 2007, at 5:28 PM, Tom Lane wrote:


 That's true at the level of DDL operations, but AFAIK we could
 parallelize table-loading and index-creation steps pretty effectively
 --- and that's where all the time goes.
 I would be happy with parallel builds of the indexes of a given table.   
 That way you have just one scan of the whole table to build all its  
 indexes.
 	Will the synchronized seq scan patch be able to do this by issuing all  
 the CREATE INDEX commands at the same time from several different database  
 connections ?
 
 No, but it could someday.


Would it be possible to track stats sufficient for a cost/benefit based
automatic recreate of all indices on a table whenever a full-table-scan
occurred, whether due to a commanded index rebuild or not?

Cheers,
  Jeremy Harris

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Postgres friendly RegEx?

2006-12-12 Thread Jeremy Harris

You don't give a pg version.

It looks legal to me as of 8.1.

Try replacing all the {0,1} with ? - but
check the manual for regex_flavor too.
Is there any chance you're in basic mode?

- Jeremy

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings