Re: [GENERAL] Using Variables in Queries
On 19 October 2017 at 17:25, Scott Mead wrote: > > > On Thu, Oct 19, 2017 at 11:21 AM, Igal @ Lucee.org wrote: >> >> Hello, >> >> In other database servers, which I'm finally dropping in favor of >> Postgres, I can do the following (mind you that this is for illustration >> only, I do not actually write queries like that): >> >> DECLARE @query varchar(64) = 'red widget'; >> >> SELECT * >> FROM products >> WHERE col1 LIKE @query >>OR col2 LIKE @query >>OR col3 LIKE @query >>OR col4 LIKE @query >>OR col5 LIKE @query >> >> The point is, though, that I can change the @query variable in one place >> which is very convenient. >> >> Is it still true (the posts I see on this subject are quite old) that I >> can not do so in Postgres outside of a stored procedure/function? You should be able to do that using the DO statement: https://www.postgresql.org/docs/9.6/static/sql-do.html -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] EAV Designs for Multi-Tenant Applications
ear success (and > failure) stories from the community on how they've dealt with these. I'm not that familiar with JSONB, but wouldn't it be possible to create an index over an organisation-specific JSON 'blob' such that all fields in it are part of the index? I expect that index types aimed at text searches (word searches) would be useful in that respect. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] a JOIN to a VIEW seems slow
> On 2 Oct 2017, at 8:32, Frank Millman wrote: > > > From: Frank Millman > Sent: Friday, September 22, 2017 7:34 AM > To: pgsql-general@postgresql.org > Subject: Re: a JOIN to a VIEW seems slow > > > On Fri, Sep 22, 2017 at 7:34 AM, Frank Millman wrote: > > > > On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote: > > > Something is not adding up here. Can you EXPLAIN ANALYZE the 26 minute > > > query? > > > > > Here it is - > > > > https://explain.depesz.com/s/cwm > > > > Just checking – is this under investigation, or is this thread considered > closed? > > Frank There are a few problems keeping track of this issue. First of all, above plan does not include the query (I don't know whether that's a thing with depesz's plan analyzer, but ISTR seeing plans _with_ their queries in other cases). That means we have to track back through the thread (quite a bit) to find a query that _may_ be the one that the plan is for. Add that to the fact that most of us are busy people, so we have to invest too much time into your problem to be able to help - and hence we don't. The second problem is that the query plan highlights a couple of slow sequential scans on ar_tran_inv and ar_tran_rec, but those tables are not in your query from Sep 21st. That makes it impossible for anyone to pinpoint the problem for you. They're probably in your views somewhere, but we can't see where. Looking at that query though, it seems to me that it could help to aggregate the results on cust_row_id in the inner query (aliased as q) to reduce the result set that the outer query needs to handle. It's possible that the query planner is smart enough to detect this, I can't tell from the plan, but I wouldn't bet on it. The query plan for that inner query should be interesting as well, especially if moving the aggregation inside does not help. Another possible optimisation would be to add a lower bound for tran_date, if such is possible for your case. Currently you only have an upper bound: tran_date <= '2015-09-30'. Even if there is no data from, say, before 2015-01-01, the query planner does not necessarily know that and may assume that most rows in the table/view will match the upper-bound condition - in which case a sequential scan is probably fastest. That may be why you don't see Postgres use the index on those columns you mentioned at some point. Now, apparently you have an index on columns tran_type and tran_row_id, while tran_row_id is the primary key? In that case I'd suggest you drop that index: Any value of tran_row_id will have a single value of tran_type and you're joining on the PK already. Meanwhile, the condition on tran_type in the query only serves to complicate the query. Oh, BTW, those LEFT JOINs will behave as INNER JOINs, because you use columns from the right-hand side (alloc_trans.tran_date and due_trans.tran_date respectively) in your WHERE clauses without allowing them to be NULL. If you want those parts to behave like proper LEFT JOINs, either add OR xxx.tran_date IS NULL or move those expressions into the JOIN conditions. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Insert large number of records
On 20 September 2017 at 22:55, Job wrote: > One further question: within a query launched on the MASTER table where i > need to scan every table, for exaple to search rows locatd in more partitions. > In there a way to improve "parallel scans" between more table at the same > time or not? > I noticed, with explain analyze, the scan in the master table is Always > sequential, descending into the partitions. Since nobody has replied to your latest question yet, I'll give it a try. Which tables a query on your MASTER table needs to scan largely depends on a PG feature called "constraint exclusion". That is to say, if the query optimizer can deduce from your query that it only needs to scan certain partitions for the required results, then it will do so. Now, whether the optimizer can do that, depends on whether your query conditions contain the same (or equivalent) expressions on the same fields of the same types as your partitioning constraints. That 'same type' part is one that people easily miss. Sometimes part of an expression gets auto-cast to make it compatible with the remainder of the expression, but that is sometimes not the same type as what is used in your partitioning (exclusion) constraint. In such cases the planner often doesn't see the similarity between the two expressions and ends up scanning the entire set of partitions. See also section 5.10.4 in https://www.postgresql.org/docs/current/static/ddl-partitioning.html , although it doesn't go into details of how to construct your select statements to prevent scanning the entire partition set. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Insert large number of records
On 20 September 2017 at 07:42, Job wrote: > We use a "temporary" table, populated by pg_bulkload - it takes few minutes > in this first step. > Then, from the temporary table, datas are transferred by a trigger that copy > the record into the production table. > But *this step* takes really lots of time (sometimes also few hours). > There are about 10 millions of record. Perhaps the problem isn't entirely on the writing end of the process. How often does this trigger fire? Once per row inserted into the "temporary" table, once per statement or only after the bulkload has finished? Do you have appropriate indices on the temporary table to guarantee quick lookup of the records that need to be copied to the target table(s)? > We cannot use pg_bulkload to load directly data into production table since > pg_bulkload would lock the Whole table, and "COPY" command is slow and would > not care about table partitioning (COPY command fire partitioned-table > triggers). As David already said, inserting directly into the appropriate partition is certainly going to be faster. It removes a check on your partitioning conditions from the query execution plan; if you have many partitions, that adds up, because the database needs to check that condition among all your partitions for every row. Come to think of it, I was assuming that the DB would stop checking other partitions once it found a suitable candidate, but now I'm not so sure it would. There may be good reasons not to stop, for example if we can partition further into sub-partitions. Anybody? Since you're already using a trigger, it would probably be more efficient to query your "temporary" table for batches belonging to the same partition and insert those into the partition directly, one partition at a time. Even better would be if your bulkload could already be organised such that all the data in the "temporary" table can indiscriminately be inserted into the same target partition. That though depends a bit on your setup - at some point the time saved at one end gets consumed on the other or it takes even longer there. Well, I think I've thrown enough ideas around for now ;) -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Insert large number of records
> On 19 Sep 2017, at 15:47, Job wrote: > > Hi guys, > > we need to insert from a table to another (Postgresql 9.6.1) a large amount > of data (about 10/20 millions of rows) without locking destination table. > Pg_bulkload is the fastest way but it locks the table. > > Are there other ways? > Classic "COPY" from? We do something like that using a staging table to load to initially (although not bulk; data arrives in our staging table with batches of 5k to 100k rows) and then we transfer the data using insert/select and "on conflict do". That data-transfer within PG takes a couple of minutes on our rather limited VM for a wide 37M rows table (~37GB on disk). That only locks the staging table (during initial bulkload) and the rows in the master table that are currently being altered (during the insert/select). If your data-source is a file in a format supported by COPY, then you can use COPY to do the initial bulk load into the staging table. Some benefits of this 2-stage approach are that it leaves room to manipulate the data (type conversions, for example) and that it can handle the scenario where a matching target record in the master table already exists. In our case, we convert character fields to varchar (which saves a lot of space(s)). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance with high correlation in group by on PK
On 8 September 2017 at 00:23, Jeff Janes wrote: > On Tue, Aug 29, 2017 at 1:20 AM, Alban Hertroys wrote: >> >> On 28 August 2017 at 21:32, Jeff Janes wrote: >> > On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys >> > wrote: >> >> >> >> Hi all, >> >> >> >> It's been a while since I actually got to use PG for anything serious, >> >> but we're finally doing some experimentation @work now to see if it is >> >> suitable for our datawarehouse. So far it's been doing well, but there >> >> is a particular type of query I run into that I expect we will >> >> frequently use and that's choosing a sequential scan - and I can't >> >> fathom why. >> >> >> >> This is on: >> >> >> >> >> >> The query in question is: >> >> select "VBAK_MANDT", max("VBAK_VBELN") >> >> from staging.etl1_vbak >> >> group by "VBAK_MANDT"; >> >> >> >> This is the header-table for another detail table, and in this case >> >> we're already seeing a seqscan. The thing is, there are 15M rows in >> >> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT", >> >> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact, >> >> we only have 1 at the moment!). >> > >> > >> > You need an "index skip-scan" or "loose index scan". PostgreSQL doesn't >> > currently detect and implement them automatically, but you can use a >> > recursive CTE to get it to work. There are some examples at >> > https://wiki.postgresql.org/wiki/Loose_indexscan >> >> Thanks Jeff, that's an interesting approach. It looks very similar to >> correlated subqueries. >> >> Unfortunately, it doesn't seem to help with my issue. The CTE is >> indeed fast, but when querying the results from the 2nd level ov the >> PK with the CTE results, I'm back at a seqscan on pdw2_vbak again. > > > Something like this works: > > create table foo as select trunc(random()*5) as col1, random() as col2 from > generate_series(1,1); > create index on foo (col1, col2); > vacuum analyze foo; > > > with recursive t as ( >select * from (select col1, col2 from foo order by col1 desc, col2 desc > limit 1) asdfsaf > union all > select > (select col1 from foo where foo.col1 < t.col1 order by col1 desc, col2 > desc limit 1) as col1, > (select col2 from foo where foo.col1 < t.col1 order by col1 desc, col2 > desc limit 1) as col2 >from t where t.col1 is not null > ) > select * from t where t is not null; > > It is pretty ugly that you need one subquery in the select list for each > column to be returned. Maybe someone can find a way to avoid that part. I > tried using lateral joins to get around it, but couldn't make that work. > > Cheers, > > Jeff Thanks Jeff. That does indeed look ugly. Since we're dealing with a 4GL language (FOCUS) that translates to SQL, I don't think we'll attempt your workaround, even though we can use SQL directly in our reports if we want to. But, I just remembered giving someone else in a similar situation some advice on this very list; Obviously, when my first primary key field is not very selective, I should change the order of the fields in the PK! But let's first enjoy the weekend. Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looking for a globally unique row ID
On 15 September 2017 at 11:03, Rafal Pietrak wrote: >> Isn't this typically handled with an inheritance (parent-children) >> setup. MasterDocument has id, subtype and any common columns (create >> date etc) then dependents use the same id from master to complete the >> data for a given type. This is really common in ORM tools. Not clear >> from the description if the operations could be similarly handled >> (operation id, operation type as master of 17 dependent >> operationSpecifics; there is also the "Activity Model") > > I do that, but may be I do that badly. > > Currently I do have 6 levels of inheritance which partition my > document-class space. But I cannot see any way to have a unique index > (unique constraint) to cover all those partitions at once. > > This is actually the core of my question: How to make one? > > So far I only have separate unique indexes on all those 12 child-table > document-class subtables. Is there a way to combine those indexes? I > experimented, and an index created on parent table does not cover > content of child/inheriting tables. If it was, that would solve the problem. > > or I've just missinterpreted you MasterDocument suggestion? With table partitioning, provided the partitions are based on the value(s) of a particular field that is part of the primary key of the master table, the combination of the child tables' primary key and the partition's check constraint on the partitioning field guarantee that records across the partitioned tables are unique. For example, if we define: create table master_table ( year int , month int , example text , primary key (year, month) ); create child2016_table () inherits master_table; alter table child_table add constraint child2016_year_chk check (year = 2016); alter table child_table add constraint child2016_pk primary key (year, month); create child2017_table () inherits master_table; alter table child_table add constraint child2017_year_chk check (year = 2017); alter table child_table add constraint child2017_pk primary key (year, month); In above, the three separate primary keys are guaranteed to contain distinct ranges of year - provided that we forbid any records to go directly into the master table or that those records do not have years already covered by one of the child tables. Perhaps you can apply this concept to your problem? -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Performance with high correlation in group by on PK
On 28 August 2017 at 21:32, Jeff Janes wrote: > On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys wrote: >> >> Hi all, >> >> It's been a while since I actually got to use PG for anything serious, >> but we're finally doing some experimentation @work now to see if it is >> suitable for our datawarehouse. So far it's been doing well, but there >> is a particular type of query I run into that I expect we will >> frequently use and that's choosing a sequential scan - and I can't >> fathom why. >> >> This is on: >> >> >> The query in question is: >> select "VBAK_MANDT", max("VBAK_VBELN") >> from staging.etl1_vbak >> group by "VBAK_MANDT"; >> >> This is the header-table for another detail table, and in this case >> we're already seeing a seqscan. The thing is, there are 15M rows in >> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT", >> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact, >> we only have 1 at the moment!). > > > You need an "index skip-scan" or "loose index scan". PostgreSQL doesn't > currently detect and implement them automatically, but you can use a > recursive CTE to get it to work. There are some examples at > https://wiki.postgresql.org/wiki/Loose_indexscan Thanks Jeff, that's an interesting approach. It looks very similar to correlated subqueries. Unfortunately, it doesn't seem to help with my issue. The CTE is indeed fast, but when querying the results from the 2nd level ov the PK with the CTE results, I'm back at a seqscan on pdw2_vbak again. Just the CTE plan is in skipScan-top.sql. The complete plan is in skipScan-full.sql Note: I cloned the original etl1_vbak table into a new pdw2_vbak table that has varchars instead of chars, which reduced the table size to just over half the original's size. Hence the different table names, but the behaviour for this particular issue is the same between them. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. warehouse=# explain (analyze, buffers) with recursive t as ( select min("VBAK_MANDT") as "VBAK_MANDT" from staging.pdw2_vbak union all select (select min("VBAK_MANDT") as "VBAK_MANDT" from staging.pdw2_vbak where "VBAK_MANDT" > t."VBAK_MANDT") from t where t."VBAK_MANDT" is not null ) select "VBAK_MANDT" from t; QUERY PLAN - CTE Scan on t (cost=98.31..100.33 rows=101 width=32) (actual time=0.031..0.054 rows=2 loops=1) Buffers: shared hit=9 CTE t -> Recursive Union (cost=0.73..98.31 rows=101 width=32) (actual time=0.029..0.052 rows=2 loops=1) Buffers: shared hit=9 -> Result (cost=0.73..0.74 rows=1 width=32) (actual time=0.029..0.029 rows=1 loops=1) Buffers: shared hit=5 InitPlan 3 (returns $1) -> Limit (cost=0.56..0.73 rows=1 width=32) (actual time=0.026..0.027 rows=1 loops=1) Buffers: shared hit=5 -> Index Only Scan using pdw2_vbak_pkey on pdw2_vbak pdw2_vbak_1 (cost=0.56..2375293.75 rows=14214332 width=32) (actual time=0.024..0.024 rows=1 loops=1) Index Cond: ("VBAK_MANDT" IS NOT NULL) Heap Fetches: 1 Buffers: shared hit=5 -> WorkTable Scan on t t_1 (cost=0.00..9.56 rows=10 width=32) (actual time=0.009..0.010 rows=0 loops=2) Filter: ("VBAK_MANDT" IS NOT NULL) Rows Removed by Filter: 0 Buffers: shared hit=4 SubPlan 2 -> Result (cost=0.93..0.94 rows=1 width=32) (actual time=0.015..0.015 rows=1 loops=1) Buffers: shared hit=4 InitPlan 1 (returns $3) -> Limit (cost=0.56..0.93 rows=1 width=32) (actual time=0.013..0.013 rows=0 loops=1) Buffers: shared hit=4 -> Index Only Scan using pdw2_vbak_pkey on pdw2_vbak (cost=0.56..1732075.91 rows=4738111 width=32) (actual time=0.012..0.012 rows=0 loops=1) Index Cond: (("VBAK_MANDT" IS NOT NUL
Re: [GENERAL] Performance with high correlation in group by on PK
On 28 August 2017 at 14:22, Alban Hertroys wrote: > This is on: Just noticed I forgot to paste this in: warehouse=# select version(); version -- PostgreSQL 9.6.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit (1 row) -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Performance with high correlation in group by on PK
Hi all, It's been a while since I actually got to use PG for anything serious, but we're finally doing some experimentation @work now to see if it is suitable for our datawarehouse. So far it's been doing well, but there is a particular type of query I run into that I expect we will frequently use and that's choosing a sequential scan - and I can't fathom why. This is on: The query in question is: select "VBAK_MANDT", max("VBAK_VBELN") from staging.etl1_vbak group by "VBAK_MANDT"; This is the header-table for another detail table, and in this case we're already seeing a seqscan. The thing is, there are 15M rows in the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT", "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact, we only have 1 at the moment!). Explain analyze says the following about this query: warehouse=# explain (analyze, buffers) select "VBAK_MANDT", max("VBAK_VBELN") from staging.etl1_vbak group by "VBAK_MANDT"; QUERY PLAN -- HashAggregate (cost=1990054.08..1990054.09 rows=1 width=36) (actual time=38723.602..38723.602 rows=1 loops=1) Group Key: "VBAK_MANDT" Buffers: shared hit=367490 read=1409344 -> Seq Scan on etl1_vbak (cost=0.00..1918980.72 rows=14214672 width=15) (actual time=8.886..31317.283 rows=14214672 loops=1) Buffers: shared hit=367490 read=1409344 Planning time: 0.126 ms Execution time: 38723.674 ms (7 rows) As you can see, a sequential scan. The statistics seem quite correct. If I change the query to select a single value of "VBAK_MANDT" we get: warehouse=# explain (analyze, buffers) select max("VBAK_VBELN") from staging.etl1_vbak where "VBAK_MANDT" = '300'; QUERY PLAN --- Result (cost=1.37..1.38 rows=1 width=32) (actual time=14.911..14.911 rows=1 loops=1) Buffers: shared hit=2 read=3 InitPlan 1 (returns $0) -> Limit (cost=0.56..1.37 rows=1 width=11) (actual time=14.907..14.908 rows=1 loops=1) Buffers: shared hit=2 read=3 -> Index Only Scan Backward using etl1_vbak_pkey on etl1_vbak (cost=0.56..11498362.31 rows=14214672 width=11) (actual time=14.906..14.906 rows=1 loops=1) Index Cond: (("VBAK_MANDT" = '300'::bpchar) AND ("VBAK_VBELN" IS NOT NULL)) Heap Fetches: 1 Buffers: shared hit=2 read=3 Planning time: 0.248 ms Execution time: 14.945 ms (11 rows) That is more in line with my expectations. Oddly enough, adding "MANDT_VBAK" and the group by back into that last query, the result is a seqscan again. For "VBAK_MANDT" we see these statistics: Null fraction: 0 Average width: 4 Distinct values: 1 Most common values: {300} Most common frequencies: {1} Histogram bounds : Correlation :1 The table definition is: Table "staging.etl1_vbak" Column| Type | Modifiers -+---+--- VBAK_MANDT | character(3) | not null VBAK_VBELN | character(10) | not null VBAK_ERDAT | date | not null VBAK_ERZET | character(6) | not null VBAK_ERNAM | character(12) | not null VBAK_ANGDT | date | not null VBAK_BNDDT | date | not null VBAK_AUDAT | date | not null ... VBAK_MULTI | character(1) | not null VBAK_SPPAYM | character(2) | not null Indexes: "etl1_vbak_pkey" PRIMARY KEY, btree ("VBAK_MANDT", "VBAK_VBELN") "idx_etl1_vbak_erdat" btree ("VBAK_ERDAT") A final remark: The table definition was generated by our reporting/ETL software, hence the case-sensitive column names and the use of the character type instead of varchar (or text). What can be done to help the planner choose a smarter plan? -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] cast issue in WITH RECURION
> On 3 Aug 2017, at 20:22, k b wrote: > > when i create a recursive query and try to add the distances i get a message: > ERROR: recursive query "edges" column 3 has type numeric(7,3) in > non-recursive term but type numeric overall. > My exercise is almost identical to the example in the docs: > WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS ( >SELECT g.id, g.link, g.data, 1, > ARRAY[g.id], > false >FROM graph g > UNION ALL >SELECT g.id, g.link, >sg.data + g.data, -- altered section, data is numeric(7,3) >sg.depth + 1, > path || g.id, > g.id = ANY(path) >FROM graph g, search_graph sg >WHERE g.id = sg.link AND NOT cycle > ) > SELECT * FROM search_graph; I believe the solution is rather simple; just cast(sg.data + g.data to numeric(7,3)) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Dealing with ordered hierarchies
> On 24 Jul 2017, at 9:02, Tim Uckun wrote: > > I have read many articles about dealing with hierarchies in postgres > including nested sets, ltree, materialized paths, using arrays as parentage, > CTEs etc but nobody talks about the following scenario. > > Say I have a hierarchy like this > > 1 > 1.1 > 1.1.1 > 1.1.2 > 1.2 > 1.3 > 2 > 2.1 > > In this hierarchy the order is very important and I want to run frequent(ish) > re-ordering of both subsets and entire trees and even more frequent inserts. Since they're hierarchies, the order is already in the structure of the data. Do you really need to add it to the data or would it suffice to add it to the query result? If that's the case, you only need a simple ordering number per branch, like 1, 2, 3, etc. The full path (ie. '1.1.3') gets generated in the query. I regularly generate structures like your above example using recursive CTE's. The "path" helps to get the results in the correct order for starters (although you're in for a surprise if any of your levels go past 9 in the above). It's great how you can "trickle" all kinds of calculations through the hierarchy using CTE's. Something like this should help to get you started (untested, I usually do this in Oracle, which has several peculiarities): with recursive hierarchy (parent, node, sequence_number, path) as ( select null, node, sequence_number, sequence_number::text from table union all select h.node, t.node, t.sequence_number, h.path || '.' || t.sequence_number::text from table t join hierarchy h on (t.parent = h.node) ) select node, path from hierarchy Where the table "table" has fields: parent -- parent node node-- actual node sequence_number -- Order of sequence of this node within its parent branch You may need to add a surrogate key if your parent/child combinations are otherwise not unique. That would then also be the way to address a node directly (otherwise it would be (parent, node)). For the sequence_number I'd probably just use an actual sequence generator with a large enough gap to prevent problems with reordering items later on (increment by 10 for example). You will also want to pad the sequence numbers in the "path" column with leading zeroes (otherwise 10 sorts between 1 and 2, etc.), enough that you won't run out of numbers per level. If you require your sequence numbers to be subsequent in the result: You can add a field with such numbering based on the existing sequence_numbers, by using a windowing function in each branch of the union - it's down to a fairly basic row numbering problem at this point. > Scenario 1: I want to insert a child into the 1.1 subtree. The next item > should be 1.1.3 and I can't figure out any other way to do this other than to > subquery the children and to figure out the max child ID, add one to it which > is a race condition waiting to happen. You would first need to determine which node is the parent node by traversing the hierarchy up to the point of insertion and use the (parent, node) or surrogate key fields to append under. Similar to using '1.1', really. > Scenario 2: I now decide the recently inserted item is the second most > important so I reset the ID to 1.1.2 and then increment 1.1.2 (and possibly > everything below). Again this is both prone to race conditions and involves > a heavy update. No need to bother with that (much) with the above approach. And if you do run out of gaps, you can fairly simply update all the sequence numbers under the same parent without causing concurrency issues and without requiring locks/synchronisation. > Is there a better way to deal with this or is the complexity unavoidable? I think it's better, but I don't think its ideal. It's fairly complicated to understand, for one thing, which can cause problems for maintenance (I have colleagues who don't dare to touch my queries, for example). > I should state that like most database reads will be much more frequent than > writes and inserts will be more frequent than updates (re-ordering) More of the logic (and thus system load) gets moved to the read-side of things, that's probably a drawback, but most of it is just keeping state and counting. I don't expect that to be all that much. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Schedule
> On 20 Jun 2017, at 18:46, Adrian Klaver wrote: > > On 06/20/2017 08:12 AM, Steve Clark wrote: >> On 06/20/2017 10:38 AM, Adrian Klaver wrote: >>> On 06/20/2017 07:00 AM, Steve Clark wrote: > >> We already have a monitoring system in place that has been in operation >> circa 2003. Just recently we have >> added a new class of customer whose operation is not 24/7. >> I envision the schedule could be fairly complicated >> including WE and holidays, plus the enduser might shut down for lunch etc. I >> am looking for more on how to organize the >> schedule, EG a standard weekly schedule then exceptions for holidays etc, or >> a separate individual schedule for >> each week, also need to consider how easy it is to maintain the schedule, >> etc. > > Yes this could be become complicated if for no other reason then it is being > driven from the customer end and there will need to be a process to verify > and incorporate their changes. There you're saying something rather important: "If it is being driven from the customer end". > 2) Figure out what a day is. In other words are different timezones involved > and if so what do you 'anchor' a day to? For an example of how that might fail: At our company, they work in shifts (I don't) of 3*8 hours, that run from 23:00 to 23:00. Depending on who looks at the data, either that's a day or a normal day (00:00-00:00) is. It's a matter of perspective. IMHO, the only safe approach is to have the customer end decide whether it's a regular outage or an irregular one. There is just no way to reliably guess that from the data. If a customer decides to turn off the system when he's going home, you can't guess when he's going to do that and you will be raising false positives when you depend on a schedule of when he might be going home. >From a software implementation point of view that means that your >customer-side application needs to be able to signal planned shutdowns and >startups. If you detect any outages without such a signal, then you can flag >it as a problem. There are still opportunities for getting those wrong of course, such as lack of connectivity between you and your customer, but those should be easy to explain once detected. And I'm sure there are plenty of other corner-cases you need to take into account. I bet it has a lot of problems in common with replication actually (how do we reliably get information from system A to system B), so it probably pays to look at what particular problems occur there and how they're solved. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Create extension C using IDE
> On 12 Jun 2017, at 20:09, Adrian Klaver wrote: > > On 06/12/2017 11:03 AM, Fabiana Zioti wrote: >> I already have postgresql 9.6 installed. I'm developing extensions for it in >> C. What would be the best IDE to debug the code? Eclipse has many errors. > > The errors are? My guess would be its user interface… I would suggest gViM or MacViM, but that really depends on what you expect from a (I)DE and on what platform you are. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Inconsistent performance with LIKE and bind variable on long-lived connection
> On 10 Jun 2017, at 5:37, Steven Grimm wrote: […] I notice that you're declaring your ResultSet variable inside the loop, which means that you create and destroy it frequently. I've been told that this is a pattern that the GC has trouble keeping up with (although that was around the Java 5 era), so you might be seeing the effects of memory churn in your client instead of in the database. I modified your function to not do that anymore, does that make a difference? Note; the long variables are scalar instead of objects. I don't think they need the same treatment, but it can't hurt. > private static void logTime(String name, PreparedStatement stmt) throws > SQLException { > StringBuilder out = new StringBuilder(String.format("%-22s", name)); > ResultSet rs; > long startTime, endTime; > > for (int i = 0; i< 20; i++) { > startTime = System.currentTimeMillis(); > rs = stmt.executeQuery(); > while (rs.next()) { > rs.getString(1); > } > endTime = System.currentTimeMillis(); > rs.close(); > > out.append(String.format(" %3d", endTime - startTime)); > } > > stmt.close(); > > System.out.println(out); > } Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weirdness with "not in" query
On 8 June 2017 at 17:27, greigwise wrote: > So, I'm using postgres version 9.6.3 on a mac and the results to this series > of queries seems very strange to me: > > db# select count(*) from table1 where id in > (1706302,1772130,1745499,1704077); > count > --- > 4 > (1 row) > > db# select count(*) from table2 where table1_id in > (1706302,1772130,1745499,1704077); > count > --- > 0 > (1 row) > > db# select count(*) from table1 where id not in (select table1_id from > table2); > count > --- > 0 > (1 row) > > I would expect the "not in" query to return a result of at least 4. Am I > totally misunderstanding how this should work (I really don't think so) or > is something wrong? You probably have table1_id's that are NULL in table2. In that case the result of not in is null as well. Not exists is perhaps a better candidate in this case. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Coditional join of query using PostgreSQL
> On 15 May 2017, at 19:02, Nick Dro wrote: > > Hi, > I'm new to postgresql and couldn't find answer to this situation anywhere. > I asked this here: > http://stackoverflow.com/questions/43984208/coditional-join-of-query-using-postgresql > > I hope there is a better solution rather than creating two separated > functions :( You can use your boolean parameter inside the join condition: […] on (tfquery.a = main.a and ((type_f and tfquery.d = main.d) or not type_f)) Beware that you don't also have a column named type_f in that join somewhere. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Column information
> According to the documentation 'numeric_precision_radix' field should > indicate what radix the value of 'numeric_precision' is stored. > > However, even though the radix is 2, the actual value is 32, which is > not a radix 2. https://en.wikipedia.org/wiki/Radix Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] UDP buffer drops / statistics collector
> On 19 Apr 2017, at 20:36, Tim Kane wrote: > > Well, this is frustrating.. > The buffer drops are still occurring - so I thought it worth trying use a > ramdisk and set stats_temp_directory accordingly. > > I've reloaded the instance, and can see that the stats directory is now being > populated in the new location. Except - there is one last file > (pgss_query_texts.stat) that continues to be updated in the old pg_stat_tmp > path.. Is that supposed to happen? > > > Fairly similar to this guy (but not quite the same). > https://www.postgresql.org/message-id/d6e71befad7beb4fbcd8ae74fadb1265011bb40fc...@win-8-eml-ex1.eml.local > > I can see the packets arriving and being consumed by the collector.. and, > the collector is indeed updating in the new stats_temp_directory.. just not > for that one file. > > > It also failed to resolve the buffer drops.. At this point, I'm not sure I > expected it to. They tend to occur semi-regularly (every 8-13 minutes) but I > can't correlate them with any kind of activity (and if I'm honest, it's > possibly starting to drive me a little bit mad). This rings a bell for me. I recently had a similar issue in an MMO (Windows) where every 15 minutes I would get a number of consecutive freezes in-game. You could set your alarm by it, so regular. That suddenly went away after I rearranged my home-network (for unrelated reasons), which incidentally moved several connections from the switch the game-system was connected to to another switch. I never pinpointed it to UDP, but then again, TCP would correct for the lost transfers (probably at the cost of UDP traffic). Perhaps you have a switch somewhere that's overburdened? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Large data and slow queries
> On 19 Apr 2017, at 12:58, Samuel Williams > wrote: > > It's interesting you talk about using multiple indexes. In > MySQL/MariaDB and derivatives, I've never seen the query planner > consider using multiple indexes. So, it's possible that Postgres may > use multiple indexes if it saves time? Or do you mean, doing something > like manually joining the data and leveraging the different indexes > explicitly? PG is capable of doing bitmap heap scans to combine results from multiple indices, among other things. Whether that will actually improve performance in this case I don't know, but it's worth a try I think. > The correlation between user_id and location... well, it's somewhat > temporally related. So users are constantly moving around but happen to be at the same locations at regular intervals? In my experience, people don't usually move around much, so you should certainly be able to pinpoint them mostly to a specific area, right? (Hence my suggestions for a country column or partitioning in squares) > On 19 April 2017 at 22:50, Alban Hertroys wrote: >> >>> On 19 Apr 2017, at 6:01, Samuel Williams >>> wrote: >>> >>> Hi. >>> >>> We have 400,000,000 records in a table (soon to be 800,000,000), here >>> is the schema (\d+) >>> >>> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121 >>> >>> We want the following kinds of query to be fast: >>> >>> SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE >>> "user_event"."what" IN ('poll', 'location_change', >>> 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19 >>> 01:23:55') AND (latitude > -37.03079375089291 AND latitude < >>> -36.67086424910709 AND longitude > 174.6307139779924 AND longitude < >>> 175.0805140220076); >>> >>> We have a btree index and it appears to be working. However, it's >>> still pretty slow. >>> >>> EXPLAIN ANALYZE gives the following: >>> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314 >>> >>> I'm thinking that I need to do the following to help: >>> >>> CLUSTER user_event ON index_user_event_for_visits_3 followed by >>> analyze... Our data is mostly time series but sometimes we get some >>> dumps with historical records. >> >> It seems to me that a large part of the problem is that the server has to >> scan all index entries from that date and within those location bounds to >> find that the distinct user id's in that set are about 114,000 out of 1.7M >> rows matching the selection-criteria. If it could stop at the first location >> for each user, it would have to scan less than a 10th of the index entries >> that it has to scan now... >> >> How high is the correlation between user id and location? That can probably >> be leveraged… >> Perhaps you'd get better performance if you'd use multiple indices instead >> of stuffing everything into a single purpose-specific one? I would suggest >> one on (user_id, latitude, longitude) or (latitude, longitude, user_id) and >> one on (created_at, user_id), or perhaps (created_at, latitude, longitude). >> That approach could also reduce the number of indices you have on that >> table, as well as their sizes, making it all fit into cache a little easier. >> Then again, additional operations will be required to combine them. >> >> For a different approach; It may be possible to enrich your data with >> something that is easy to index and query, with a high correlation to >> (latitude, longitude). That could also be used to partition over. Country >> seems a good candidate here, unless all your data-points are in New Zealand >> like the above? >> Then again, some countries are a lot larger, with a higher population, than >> others. And populations can be highly concentrated (Delhi, Moscow to name a >> few). >> Another option is to divide the location space up into squares of a fixed >> size, with a partition for each square. About 80% of those squares are >> unpopulated though, being at sea. >> >> Created_at is a very good candidate for partitioning too, especially if you >> don't intend to keep data older than a certain age. Truncating or dropping a >> partition that you no longer need is quick and easy. >> >> With data-sets this large, I'd think you would want to
Re: [GENERAL] Large data and slow queries
> On 19 Apr 2017, at 6:01, Samuel Williams > wrote: > > Hi. > > We have 400,000,000 records in a table (soon to be 800,000,000), here > is the schema (\d+) > > https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121 > > We want the following kinds of query to be fast: > > SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE > "user_event"."what" IN ('poll', 'location_change', > 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19 > 01:23:55') AND (latitude > -37.03079375089291 AND latitude < > -36.67086424910709 AND longitude > 174.6307139779924 AND longitude < > 175.0805140220076); > > We have a btree index and it appears to be working. However, it's > still pretty slow. > > EXPLAIN ANALYZE gives the following: > https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314 > > I'm thinking that I need to do the following to help: > > CLUSTER user_event ON index_user_event_for_visits_3 followed by > analyze... Our data is mostly time series but sometimes we get some > dumps with historical records. It seems to me that a large part of the problem is that the server has to scan all index entries from that date and within those location bounds to find that the distinct user id's in that set are about 114,000 out of 1.7M rows matching the selection-criteria. If it could stop at the first location for each user, it would have to scan less than a 10th of the index entries that it has to scan now... How high is the correlation between user id and location? That can probably be leveraged… Perhaps you'd get better performance if you'd use multiple indices instead of stuffing everything into a single purpose-specific one? I would suggest one on (user_id, latitude, longitude) or (latitude, longitude, user_id) and one on (created_at, user_id), or perhaps (created_at, latitude, longitude). That approach could also reduce the number of indices you have on that table, as well as their sizes, making it all fit into cache a little easier. Then again, additional operations will be required to combine them. For a different approach; It may be possible to enrich your data with something that is easy to index and query, with a high correlation to (latitude, longitude). That could also be used to partition over. Country seems a good candidate here, unless all your data-points are in New Zealand like the above? Then again, some countries are a lot larger, with a higher population, than others. And populations can be highly concentrated (Delhi, Moscow to name a few). Another option is to divide the location space up into squares of a fixed size, with a partition for each square. About 80% of those squares are unpopulated though, being at sea. Created_at is a very good candidate for partitioning too, especially if you don't intend to keep data older than a certain age. Truncating or dropping a partition that you no longer need is quick and easy. With data-sets this large, I'd think you would want to partition on multiple dimensions, creating a matrix of partitions under a single master table. I don't think PG has a simple way of doing that (yet) though; perhaps it's possible by abusing multiple levels of inheritance, but that sounds like a bad idea. And of course, create your partitions sufficiently course to prevent overburdening the system tables, which would slow down the query planner. Hopefully there's something useful in my ramblings! Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] CANNOT USE ANY INDEX ON UPSERT (INSERT.....ON CONFLICT)
> On 18 Apr 2017, at 10:13, agharta wrote: > > Hi all, > > I have a problem with INSERT ... ON CONFLICT sql command. > > Reading 9.6 documentation i see that ON CONFLICT command will accpets only > index_column_name or index_expression (unique composite/primary indexes are > valid too). > > So, my problem is that i can't create any type of upsert-valid index . Let me > explain. > > I have a table T1 containing F1, F2, F3, F4 fields. > > I can insert same records in T1, MAX TWICE. How is UPSERT supposed to know which of a pair of duplicate records it is supposed to update? You'll have to make them unique somehow. The safest approach is usually to add a surrogate key based on a sequence. > I can have records like (A,B,C,D),(B,A,D,C), etc.. and (A,B,C,D) AGAIN. Any > other next insert of (A,B,C,D) is not allowed (actually it is avoided by a > complex-and-slow-performance select count in before insert/update trigger). You're probably better off with an EXISTS query there. Something like: select F1, F2, F3, F4, case when exists (select 1 from T1 t where t.F1 = T1.F1 and t.F2 = T1.F2 and t.F3 = T1.F3 and t.F4 = T1.F4 and t.pk <> T1.pk) then 1 else 0 end as have_duplicate from T1 where F1 = NEW.F1 and F2 = NEW.F2 and F3 = NEW.F3 and F4 = NEW.F4 limit 1; The pk field in there is the surrogate key from the previous paragraph. Alternatively, wrap your COUNT around a sub-query that's limited to 2 results. No extra pk needed in that case, unless you still need to use UPSERT with that. In either case it will make a big difference to have an index on at least (F1, F2, F3, F4), perhaps with the new pk column added at the end. > In this case i can't create any type of primary/unique index, like a > composite F1,F2, F3, F4 index. (correct me if i am wrong please). Correct, you'll most likely have to add a new one (unless someone comes up with better suggestions). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [ADMIN] calculating table and index size
On 7 April 2017 at 09:11, Günce Kaya wrote: > Hi again, > > Sorry for delay. > > Guillaume, I read your answer for first question but It's not clear to me. > The table has a column and index also use that column. so in that example, I > think table size and index size should be equal. Why these are not equal? If I understand correctly, the table only has 1 (integer) column and all those 1,400,000 rows have the same value? Then the table has to store each row separately and thus has to store the same value repeatedly. It also has to store some meta-data, such as visibility information. The index on the other hand (assuming a btree index) knows that there is only a single value in the table and therefore only stores a single value, but it has to reference each row in the table that contains that value. So the table and the index are storing different things, but the total size of each row/index node for that single integer column is of the same order of magnitude. That's why they are similar in size. If you would add another integer column to your table and VACUUM FULL the table, the table would be about double its size, but the index would stay the same size. Regards, Alban. > On Wed, Apr 5, 2017 at 1:02 PM, Steven Chang > wrote: >> >> Hello, >> >> try pgstattuple() and pgstatindex() , I think you will figure it out. >> >> Steven >> >> 2017-04-05 16:56 GMT+08:00 Guillaume Lelarge : >>> >>> Hi, >>> >>> 2017-04-05 9:44 GMT+02:00 Günce Kaya : Hi all, I have some questions about calculating table and index size. I have a dummy table which has an integer column and its index. The table has 140 rows and all of rows are same thats value is 2000. Table size is 50MB and index size is 31MB. Why there is too much size difference between table and its index? what happen on data files when we add index? >>> >>> You have metadata informations in the table datafiles that you don't have >>> on the index datafiles. For example, all the system columns for each line. >>> Second question is that after created table, table size was 0 byte. I inserted a row as 120 then table size was 8192 byte. I inserted five times same value to the table and table size is still 8192 bytes. Table size changed after inserted lots of rows. Table size was stabile till first few hundred rows. why table size didn't change when I inserted lots of rows? >>> >>> PostgreSQL works with 8KB blocks. When you insert a line, it puts it on a >>> block, but this block may contain many lines. So your next new lines still >>> fit in the first block... until it doesn't, and you'll see a new block >>> coming, making your table datafile grows to 16KB. And so on and so on. >>> >>> >>> -- >>> Guillaume. >>> http://blog.guillaume.lelarge.info >>> http://www.dalibo.com >> >> > > > > -- > Gunce Kaya > > Linkedin - Twitter - Blog -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Run statements before pg_dump in same transaction?
> On 23 Mar 2017, at 20:23, John R Pierce wrote: > > On 3/23/2017 12:06 PM, François Beausoleil wrote: >> For development purposes, we dump the production database to local. It's >> fine because the DB is small enough. The company's growing and we want to >> reduce risks. To that end, we'd like to anonymize the data before it leaves >> the database server. > > how about piping the /output/ of pg_dump to GPG or a similar file/stream > encryption module ? John, anonymising data is not about encrypting, but about making data anonymous. This is usually done to create data sets useful for development or statistical analysis, whilst preventing people from obtaining sensitive information from the data set. For example, companies want to know how many facebook users are interested in buying drones, but those companies should not know who exactly those people are. ISTR that there are some tools for this purpose, but the details escape me. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to create unique index on multiple columns where the combination doesn't matter?
> On 22 Mar 2017, at 17:54, Glen Huang wrote: > > Hello, > > If I have a table like > > CREATE TABLE relationship ( > obj1 INTEGER NOT NULL REFERENCES object, > obj2 INTEGER NOT NULL REFERENCES object, > obj3 INTEGER NOT NULL REFERENCES object, > ... > ) > > And I want to constrain that if 1,2,3 is already in the table, rows like > 1,3,2 or 2,1,3 shouldn't be allowed. > > Is there a general solution to this problem? Does the order of the values of (obj1, obj2, obj3) in relationship matter? If not, you could swap them around on INSERT/UPDATE to be in sorted order. I'd probably go with a BEFORE INSERT OR UPDATE trigger. In addition, to prevent unsorted entry, on obj2 add CHECK (obj2 > obj1) and on obj3 add CHECK (obj3 > obj2). Now you can create a normal PK or unique key on (obj1, obj2, obj3) as the order of their values is not variable anymore. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] CHECK for 2 FKs to be non equal
> On 11 Mar 2017, at 10:41, Alexander Farber wrote: > > uid integer NOT NULL REFERENCES words_users(uid) CHECK (uid <> > author) ON DELETE CASCADE, > but get syntax error in 9.5: > > ERROR: syntax error at or near "ON" > LINE 2: ...REFERENCES words_users(uid) CHECK (uid <> author) ON DELETE … You put your CHECK constraint definition smack in the middle of the FK constraint definition, which starts with REFERENCES and ends with the delete CASCADE. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Simple Query not using Primary Key Index
> On 7 Feb 2017, at 0:16, Podrigal, Aron wrote: > > Hi, > > I noticed when I do a simple SELECT id FROM mytable WHERE id = > 'cb81d070-4213-465f-b32e-b8db43b83a25'::UUID Postgres does not use the > primary key index and opts for a Seq Scan. > > I of course did VACUUM ANALYZE and I have reset statistics But no sign. Is > there any particular thing I should be looking at? An EXPLAIN ANALYZE would be a good start. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] (solved) R: Very strange problem on index
On 2 February 2017 at 14:57, Job wrote: > Hi Raymond, > > Thank nyou for your appreciated feedback. But what's your answer to his question? You still didn't tell. > Here is the original message: > > i really strange problem, quite near to paranormal, is occurring during a > server migration. > > We have a table with some millions of record, perfectly working on other > Postgresql 9.6.1 machines: > > Table "public.webrecord" > Column | Type | Modifiers > ---+--+- > ---+--+--- > id| integer | not null default > nextval('webrecord_id_seq'::regclass) > categoria | character varying| > dominio | character varying| > country | character varying(5) | > Indexes: > "keywebrecord" PRIMARY KEY, btree (id) > "dominio_idx" btree (dominio) > > By entering a simple query: > > Select * from webrecord where dominio='sito.com' it wait some seconds and cpu > raise up. > > I tried to: > - delete the tabel > - recreate the schema again > - re-populating it > - reindexing > - vacuum > > But nothing happened. > The query planne says it scans lots of rows: > > explain analyze select * from webrecord where dominio='crl.microsoft.com'; > QUERY PLAN > -- > Gather (cost=1000.00..144045.21 rows=62073 width=92) (actual > time=1096.202..1096.202 rows=0 loops=1) >Workers Planned: 4 >Workers Launched: 4 >-> Parallel Seq Scan on webrecord (cost=0.00..136837.91 rows=15518 > width=92) (actual time=1063.277..1063.277 rows=0 loops=5) > Filter: ((dominio)::text = 'crl.microsoft.com'::text) > Rows Removed by Filter: 2482938 Planning time: 0.119 ms Execution > time: 1107.846 ms > > On other machine the results are immediate! > > I have finished my tries... > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is table not found?
> On 31 Jan 2017, at 14:45, Egon Frerich wrote: >> >> CREATE TABLE "Raum" >> ( >> "RaumID" serial NOT NULL, -- Automatisch vergebenes > But psql tells me "Raum" is not existent: > >> egon@xfEinzel ~ $ psql Hausrat >> psql (9.3.15) >> Type "help" for help. >> >> Hausrat=# SELECT * FROM Raum; >> ERROR: relation "raum" does not exist >> LINE 1: SELECT * FROM Raum; Instead of adding quotes to your select statement (as others suggested), IMHO the better thing to do is to remove them from your create statements. Most of the time there is no benefit creating case-sensitive identifiers in a database. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres 9.6.1 big slowdown by upgrading 8.4.22
> On 7 Jan 2017, at 15:44, Job wrote: > > Hi guys, > > First of all excuse me but i really do not explain the problem, sorry... > >>> Are you being serious? You're complaining about a "big slowdown" for a >>> query that goes from 1.5ms to 4ms? >>> What is the actual problem you're trying to solve? Because I don't see one >>> in the above. > > Single query if fast both in 8.4.22 and 9.6.1, no problem. > > But the problem is not here! > > The big problem is the benchmark before put the system under production. > We launch about 100/200 queries per second and we monitor with "top" the two > machines. > They are VM with 4 vCPU and 10Gb of RAM, with CentOS 7.2 64bit. > > This is what it happens: > > Postgres 8.4.22 > Medium average load 1.5/2.0 > Further queries respond very quickly > > Postgres 9.6.1 > Medium average load 18.0/20.0 !! > Further queries are really very slow > There is a bottle neck I see. > By removing *only* this condition in the query function: > "exists ( select 1 from gruorari where gruorari.idgrucate=grucategorie.id and > ( (('{'||gg_sett||'}')::int[] && array[EXTRACT(DOW FROM NOW())::int])='t' and > now()::time between gruorari.dalle::time and gruorari.alle::time) )" Then most likely the slow-down you're experiencing is indeed in the above subquery. It could also be the addition of the exists though, let's not rule that out! Note that I'm not on either of the versions involved (9.3.15 here), so I can't easily observe what you're seeing. A general observation; I think now() calls gettimeofday() each time, the performance of which can differ significantly depending on which hardware clock is being used by your OS (there are often multiple options). On the contrary, CURRENT_TIMESTAMP, CURRENT_TIME and friends are only updated at the start of the transaction, requiring but a single call to gettimeofday(). Judging from your queries, you don't actually seem to need the accuracy that NOW() provides… The one-liner is a bit hard to read, btw - and so requires more effort from anyone trying to help. A useful next step would be to benchmark the separate parts of that query: 1). where gruorari.idgrucate = grucategorie.id 2). and (('{'||gg_sett||'}')::int[] && array[EXTRACT(DOW FROM NOW())::int])='t' 3). and now()::time between gruorari.dalle::time and gruorari.alle::time 4). exists(select 1) Ad 1). If there's any difference in performance between the 2 PG versions here, most likely it's a different plan for this condition. It might be as simple as a difference in statistics or number of rows. Ad 2). This part seems a bit convoluted, which may be responsible for some of the overhead. Frankly, I'm not 100% certain of the purpose of that condition, but it appears that the gg_sett field contains a comma-separated list of days of the week that need to be matched to today's day of the week. I rewrote it to: extract(DOW from NOW()) = any('{' || gg_sett || '}'::int[]) Performance of either query on my 9.3 installation is pretty much the same, but I have only done some synthetic benchmarks: => explain analyze select '{6,7,0}'::int[] && array[extract(DOW from NOW())::int] from generate_series(1, 1); QUERY PLAN --- Function Scan on generate_series (cost=0.00..20.00 rows=1000 width=0) (actual time=4.548..58.072 rows=1 loops=1) Total runtime: 77.116 ms (2 rows) => explain analyze select extract(DOW from NOW()) = any('{6,7,0}'::int[]) from generate_series(1, 1); QUERY PLAN --- Function Scan on generate_series (cost=0.00..18.75 rows=1000 width=0) (actual time=4.341..48.902 rows=1 loops=1) Total runtime: 67.477 ms (2 rows) Ad 3). Casting fields in the where-clause is usually a bad idea. Unless you have indexes on those fields in which they are cast to time AND the query planner chooses to use those indexes, the type-cast will get applied to every candidate record each. If you have a million candidate records, that's 2x a million casts taking place (for two fields). To say more about this we would need more details about what types those fields are and why you're casting them to time. > The Postgres 9.6.1 machine average workload return at about 2.0/3.0
Re: [GENERAL] Postgres 9.6.1 big slowdown by upgrading 8.4.22
> On 6 Jan 2017, at 16:56, Job wrote: > > W e use a function, the explain analyze is quite similar: > POSTGRESQL 8.4.22: > Total runtime: 1.531 ms > > POSTGRES 9.6.1: > Execution time: 4.230 ms Are you being serious? You're complaining about a "big slowdown" for a query that goes from 1.5ms to 4ms? What is the actual problem you're trying to solve? Because I don't see one in the above. Just saying, you're obviously worried about something, but should you be? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Special index for "like"-based query
> On 30 Dec 2016, at 11:42, Job wrote: > >>> And, basically, if you need help with some queries you could try >>> posting them whole, even redacted, along the table defs, this way >>> perople can see the problem and not invent one based on a partial >>> description > > Thank you very much, very kind from you. > > The index applied on the timestamp field is a btree("timestamp") > > The query is: > > select domain, sum(accessi) as c_count from TABLE where action='1' AND > profile IN ('PROFILE_CODE') AND timestamp::date BETWEEN '2016-12-27' AND > '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY > domain ORDER BY c_count DESC LIMIT 101 > > The table format is: > Column | Type | > Modifiers > ---+--+ > id| numeric(1000,1) | not null default > function_get_next_sequence('webtraffic_archive_id_seq'::text) > timestamp | timestamp with time zone | > domain| character varying(255) | > action| character varying(5) | > profile | character varying| > accessi | bigint | > url | text | > > Indexes: >"webtraffic_archive_day_2016_04_15_action_wbidx" btree (action) >"webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain) >"webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER >"webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp") > > Last question: the table is partitioned. I need to manually create index for > every sub-tables or there is a way to create on every sub-tables once? It's usually more efficient to cast the constants you're comparing to, than to cast a field value for each record in the set. The exception to that is when you have an index on the casted field. In your case, since you're casting to date and time separately, and whole days even, it's probably more efficient to combine that into: … AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND '2016-12-30 23:59:59'::timestamp with time zone ... But even then, you're excluding items that fall in the second between the end date and the next day. The new range types are useful there, for example: … AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange The above isn't entirely correct, as tsrange uses timestamp without time zone, but you get the gist. However, if those time ranges can have other values than '[00:00. 23:59]', then you probably need 2 indexes on that timestamp column; one cast to date and one to time. Otherwise, you end up creating timestamp range filters for each day in the range in the query (which could still be the better approach). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] LYDB: Feasible to use PG roles instead of application-level security?
> On 30 Dec 2016, at 4:19, Guyren Howe wrote: > > Further to my attempts to enlighten application developers about what they > might better do in the database: > > https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb > > it occurs to me to wonder whether it is practical to use PG’s own roles and > security model in lieu of using an application-level one. > > It seems that the role system in PG is sufficient for most general purposes. > One could presumably also have a table with role names and associated > metainformation (email address etc) as needed. > > If I have a system with many thousands of users, is it practical to manage > these users’ authentication and authorization using *just* Postgres? Postgres roles are global to the cluster, so you would end up with multiple thousands of roles if you have multiple databases in your cluster with different users on each. Which roles each user is allowed to have becomes quite the nightmare for the administrators, I suspect. For a web-application facing the internet, I'd say no, don't do that. You're dealing with far too many users to be maintainable. For an intranet database in a not-too-large company with a fixed set of users, it could be a good solution, especially if those roles can be linked to the company's LDAP server (assuming that's possible, I don't know). Multiple intranet applications on that same database can use the same users and roles. Someone needs to do the administration though; with volumes (of users) like that and the database knowledge level of the average system administrator, a GUI seems preferable. IMHO, pgadmin provides too many features to be practical for someone like that, you would probably prefer something that only does user administration. I don't know of anything that does that though (not a GUI user myself)... > It occurs to me that some client frameworks might have issues with their > connection pools if those connections keep switching users, assuming they > even can, but let’s set that aside for now. Or perhaps every connection could > immediately do a SET USER before executing its connection? > > This seems an attractive proposition from a security standpoint: if I use > row-level security pervasively, I can have a security system that’s nestled > nice and close to the data and presumably tricky to work around from a hacker > given direct access only to the client application. With a few changes, that could work very well. First, create roles for the different types of users that you expect. In a company, that could be by division, distinguishing division-heads, interns, etc. Secondly, have a table with the users and their attributes like you describe. Include an attribute for their database role there. Only administrator users should have access to that table. Finally, create a stored procedure that looks up a user name in that table and sets the accompanying role. If a user is not found, set the role to some default 'unprivileged' user. Make that procedure a SECURITY DEFINER with according permissions. That role stays active the entire session, so unless you close the connection, create a new one or change the user's role, this procedure doesn't need calling again. > Is this practical? Has anyone here done it? What might the caveats be? It's a fairly common practice, the ML archives should contain plenty of examples. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] LYDB: What advice about stored procedures and other server side code?
> On 27 Dec 2016, at 23:03, Guyren Howe wrote: > > I am putting together some advice for developers about getting the most out > of SQL servers in general and Postgres in particular. I have in mind the > likes of most web developers, who through ignorance or a strange cultural > preference that has emerged, tend to treat their database server as a dumb > data bucket. > > I call the project Love Your Database (LYDB). It is starting as a series of > blog posts: > > https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb > https://medium.com/@gisborne/love-your-database-simple-validations-68d5d6d0bbf3#.az4o2s152 > > I would next like to cover server-side code such as stored procedures and > triggers. I think there's an objection that most developers would come up with: Namely that stored procedures living in the central relational database can usually not be changed as easily as client-side code, without disrupting development (or worse, operation of the production system) for other users. I think that needs some thought - I don't have the answer to that one. IMO, as long as those stored procedures are part of the model (ensuring data integrity, interpreting database values, etc.), there shouldn't be a problem, but it's a terrain I haven't trodden much. There are things you usually don't want to handle at the database side, such as firing off other applications to perform tasks or user interface related operations, things like that. I once was tasked with writing a library on a database server (Oracle 8i) to handle printing of data and sending out faxes etc. That was a nightmare and in hindsight it just doesn't belong in a database. That's a task for middleware. Where the dividing line is is probably an interesting topic. Another thought related to seeing the database as a model server; since we're apparently in the model-view-controller paradigm, it seems to me that database views would probably be useful to the view-part of MVC. That's just something that struck me to make more than a bit of sense… Of course, for the actual view in the MVC paradigm there should be some kind of user interface, but database views could be really useful in preparing the data required for those, to make it fit the shape of the view. So far I like what you're writing, so keep it coming! Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL]
On 21 December 2016 at 12:51, Yogesh Sharma wrote: > Dear Alban, > > In my production system, there are lot of read write operation performed > every hour. That does not explain why you're performing REINDEX on that system; rather, it explains why you should NOT be performing REINDEX. VACUUM is what you should be using. You can VACUUM either using autovacuum with a sufficiently high frequency that it doesn't fall behind too much, or by manually running VACUUM ANALYSE on specific tables after batch operations. Which choice suits you best depends on your workload. In batches it is often also useful to run (plain) ANALYSE every so often on the relevant tables to update the statistics, but only when the batch process is not insert-only. > So, i am thinking, if i can add check during REINDEX operation nothing > update and insert operation performed. > Is it possible? I have no idea what you're saying. > On Wednesday, December 21, 2016, Alban Hertroys wrote: >> >> On 21 December 2016 at 09:59, Yogesh Sharma wrote: >> > Also, every hour,i am performing VACUUM and REINDEX operation on table. >> >> Why are you running REINDEX every hour? That's a very unusual thing to >> do, you'd need a pretty good reason for that. >> >> -- >> If you can't see the forest for the trees, >> Cut the trees and you'll see there is no forest. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL]
On 21 December 2016 at 09:59, Yogesh Sharma wrote: > Also, every hour,i am performing VACUUM and REINDEX operation on table. Why are you running REINDEX every hour? That's a very unusual thing to do, you'd need a pretty good reason for that. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is there a way to Send attachments with email using pgmail postgreSQl?
> On 19 Dec 2016, at 18:55, Adrian Klaver wrote: > > On 12/15/2016 10:07 PM, nidhi raina wrote: >> Dear Sir/Mam, >> >> I am also trying to send emails with attachments.please help me out with > > If as your subject suggests you are using this: > > https://github.com/captbrando/pgMail > > then it is not possible to send attachments: > > "Essentially, pgMail is simply a stored function written in TCL > which takes 4 arguments of type 'text' (Who is it from, who is it > to, subject, and body of message), contacts the email server via > TCL sockets, and transmits your email (Now UTF-8 Compatible!). > " Not even when you create the message with the correct MIME-headers and base64-encode your attachment within the correct MIME-section? That's a fair amount of code to write, but a stored procedure in, say TCL, should be able to handle that. Or am I missing something? Whether it's a good idea to let the database encode attachments and send e-mails is a different matter, but if it isn't doing much beside that - well, why not? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] huge table occupation after updates
Please use a readable font. Your messages are using a font that's so small that my eyes start to hurt. I still try to read them, but I - and I assume others - will stop trying if you keep this up. Sorry for the top-post, but since it's not directly appropriate to the topic that's perhaps for the better. > On 10 Dec 2016, at 19:15, Tom DalPozzo wrote: > > > > 2016-12-10 18:30 GMT+01:00 Francisco Olarte : > A couple of things first. > > 1.- This list encourages inline replying, editing the text, and frowns > upon top posting. > > 2.- Your HTML formatting with so a small size makes it harder for me ( > and I can assume some others ) to properly read your messages. > > If you want to discourage people replying to you, keep doing the two above. > > On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo wrote: > > you're right, VACUUM FULL recovered the space, completely. > > Well, it always does. ;-) > > > So, at this point I'm worried about my needs. > > I cannot issue vacuum full as I read it locks the table. > > Well, first hint of your needs. Bear in mind vacuum fulls can be very > fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000 > updates and then a vacuum full that will be very fast, time it ). > > > In my DB, I (would) need to have a table with one bigint id field+ 10 bytea > > fields, 100 bytes long each (more or less, not fixed). > > 5/1 rows maximum, but let's say 5000. > > As traffic I can suppose 1 updates per row per day (spread over groups > > of hours; each update involving two of those fields, randomly. > > Also rows are chosen randomly (in my test I used a block of 2000 just to try > > one possibility). > > So, it's a total of 50 millions updates per day, hence (50millions * 100 > > bytes *2 fields updated) 10Gbytes net per day. > > Not at all. That's the volume of updated data, you must multiply by > the ROW size, not just the changed size, in your case 50M * 1100 ( to > have some wiggle room ), 55Gbytes. > > But this is the UPPER BOUND you asked for. Not the real one. > > > I'm afraid it's not possible, according to my results. > > It certaninly is. You can set a very aggresive autovacuum schedule for > the table, or even better, you may vacuum AFTER each hourly update. > This will mark dead tuples for reuse. It will not be as fast, but it > can certainly be fast enough. > > And, if you only update once an hour, you may try other tricks ( like > copy to a temp table, truncate the original and insert the temp table > in the original, although I fear this will lock the table too, but it > will be a very short time, your readers may well tolerate it. ) > > Yours seem a special app with special need, try a few, measure, it is > certainly possible. > > Francisco Olarte. > > Hi, I think you're right. I was surprised by the huge size of the tables in > my tests but I had not considered the vacuum properly. > My test had a really huge activity so perhaps the autovacuum didn't have time > to make the rows reusable. > Also, issuing plain VACUUM command does nothing visibile at once, but only > after when, inserting new rows, the size doesn't increase. > I will try again as you suggest. > Thank you very much > Pupillo Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Upgrade from 9.5.4 to 9.6.1
On 16 November 2016 at 16:33, Rich Shepard wrote: > On Tue, 15 Nov 2016, Rich Shepard wrote: > If 9.6.1 is currently running after running initdb, and I can access my > databases, what does pg_upgrade do that's necessary? pg_upgrade migrates your databases from your old (9.5) cluster to the new (9.6) one. Initdb doesn't do that. If your 9.6 database does indeed contain your databases, then something must have done the pg_upgrade for you. Perhaps the slackware package script does something like that, but that would cause problems for people who do _not_ want to migrate their databases, so I doubt that's what happened. What I think what happened is that you are using the new pg 9.6 psql binary to list the databases in your old 9.5 cluster. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Recover from corrupted database due to failing disk
On 4 November 2016 at 11:20, Gionatan Danti wrote: > Unfortuntaly I am working with incredible constrains from customer side; > even buying two SAS disks seems a problem. Moreover, as an external > consultant, I have basically no decision/buying power :| > What I can do (and I did) is to raise a very big red flag and let others > decide what to do. It seems to me that your customer doesn't realise how expensive it would be if their server would be unavailable for any length of time or if they would actually lose the data it contains. That, or the data of your customer isn't so valuable that it's worth your time. We've been fighting a somewhat similar fight internally here, where management wasn't prepared to spend € 30,000 once on a server plus software licenses, while they pay that to one of our new managers monthly. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other
On 4 November 2016 at 14:41, Merlin Moncure wrote: > On Fri, Nov 4, 2016 at 8:08 AM, Kim Rose Carlsen wrote: >> The nulls are generated by something like this >> SELECT c.circuit_id, >>cc.customer_id >>FROM circuit AS c >> LEFT JOIN circuit_customer AS cc >> ON c.circuit_id = cc.circuit_id >> >> To make a magic '0' customer we would be required to use >> COALESCE(cc.customer_id, '0') >> I dont think the optimizer will do anything clever with the '0' we have >> computed from null. > > It would if you explicitly indexed it as such; > CREATE INDEX ON circuit_customer((COALESCE(customer_id, '0')); Merlin, it's a LEFT JOIN. There probably are no NULLs in the circuit_customer.customer_id column, so that COALESCE isn't going to achieve anything at all. I haven't been following this particular discussion in detail, so unfortunately I can't contribute more than that remark at the moment. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] initdb createuser commands
On 31 October 2016 at 15:50, Christofer C. Bell wrote: > > I think the OP's point is that having a hodgepodge of (on their face) > unrelated commands smells kinda unorganized at best and unprofessional at > worst. Wether or not he's right is up to the reader. For me, I agree with > his sentiment. > The solution he's suggesting is to bring all of these commands under one > umbrella either by bundling them in an administrative utility or by giving > them a prefix that shows they're related to "the PostgreSQL database." > He's getting a lot of pushback that really feels it's coming from the > wrong direction. "Just learn it." "It's always been this way." "No one > agrees with you." These arguments are unconvincing. That said, there's > nothing wrong with just saying, "we're not going to change it because we > don't want to." > > There is the issue that by introducing new commands that are better organised, the new user will get introduced to more commands instead of fewer - when they run into a problem or if they bought the book, the commands they'll encounter will be the "old" commands. There's also the learning curve of having a single wrapper-command that can do anything pg-related. The purpose of a command named pg_createuser is obvious, the purpose of a command named pg or pga is not so obvious. Personally, I sometimes work with Firebird for educational purposes and I can't make heads or tails of their command-line tools (with the exception of isql, but only when I remember it was based on Interbase). To me, the pg tools are much easier to remember, even though their naming isn't always consistent. I do think however that having the pg-commands prefixed with pg_ is actually helpful to both new and experienced users. One reason is that it limits the number of commands matched for command completion after typeing pg_ (which is only 3 characters to type). ISTR some argument against using underscores because they would be hard to type, but I can't understand why. That said, renaming the commands provides a rather minor benefit at best. Having this much fuss about it is out of proportion IMHO. I remember learning those commands (when pg 7.4.7 was a big deal) and it certainly did not cost me the majority of time that I needed to learn to use pg, and once I did learn them I knew where to find at least the documentation. My few cents. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: [GENERAL] Rows are repeating by the trigger function
On 31 October 2016 at 14:41, Adrian Klaver wrote: > On 10/31/2016 02:06 AM, Kiran wrote: >> I know 94 = 1 + (3 * 31). >> I am just having a normal insert statement into cf_question table. > > Are there any other triggers on the tables? I'm fairly confident that the duplicates are from updates on the cf_question table. Since the trigger also fires on update and then inserts another record, that would explain the duplication pretty well. Nevertheless, if there are other triggers those bear investigation. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Way to quickly detect if database tables/columns/etc. were modified?
> On 30 Oct 2016, at 10:45, Evan Martin wrote: > > If I have a query that reads from system tables like pg_class, pg_namespace, > pg_attribute, pg_type, etc. and I'd like to cache the results in my > application is there any fast way to detect when any changes have been made > to these system catalogs? I don't need to know exactly what has changed. > Some kind of a global "database version" would do, just so I know that I need > to invalidate my cache (the database definition is rarely modified in > practice). I think the usual practice for such situations is to do database changes through SQL scripts[1] that are under version control. Since they are under VC, you can automatically write the version[2] into the SQL script on commit of changes to said script through a commit hook. That version in the SQL script can then be used in an UPDATE statement to some database-global settings table[3]. And there you have your database version. Ad 1. Never do changes directly in the database when you go this route! Ad 2. Those are often hashes these days. Ad 3. You could even have the UPDATE statement be automatically added by the commit hook of your VC of choice. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Rows are repeating by the trigger function
> On 30 Oct 2016, at 10:31, Kiran wrote: > > Dear Folks, > > I have a table cf_question with 31 rows. > I want to insert/update another table cf_user_question_link when cf_question > table is inserted/updated with row(s). > I have written trigger function for this as follows. > > > CREATE FUNCTION user_question_link() RETURNS trigger AS > $user_question_link$ > begin > SET search_path TO monolith; >INSERT INTO >cf_user_question_link(cf_user_id,cf_question_id) >VALUES(NEW.user_id,NEW.cf_question_id); > RETURN NEW; > end; > $user_question_link$ > LANGUAGE plpgsql > COST 100; > > > /* Call the trigger function */ > > CREATE TRIGGER insert_user_question_link AFTER INSERT OR UPDATE > ON monolith.cf_question > FOR EACH ROW EXECUTE PROCEDURE user_question_link(); > > > Problem: The cf_user_question_link gets inserted with 94 rows instead of 31 > rows. The 31 rows are repeated 3 times > I tried dropping the trigger function and recreating it but > with the same 94 rows in the table. > > It would be great if any from the forum point to me where I am doing wrong. I don't think you want that same trigger to fire on UPDATE of cf_question, like you do now. On UPDATE you have two choices; - either you need to take changes to those _id columns into account and delete rows that belong to the OLD link and not to the NEW one (or do nothing if those stayed the same) - or you do nothing (no trigger needed) because in the majority of cases changing FK's is limited to a few power users at best and they're supposed to know what they're doing. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] WHERE ... IN condition and multiple columns in subquery
On 28 October 2016 at 13:03, Alexander Farber wrote: > Hello, > > is it please possible to rewrite the SQL query > > SELECT DISTINCT ON (uid) > uid, > female, > given, > photo, > place > FROM words_social > WHERE uid IN (SELECT player1 FROM games) > OR uid IN (SELECT player2 FROM games) > ORDER BY uid, stamp DESC > > where first column player1 is fetched in a subquery and then column player2 > is fetched from the same table? > > I've searched around and it seems that a JOIN should be used here, but can > not figure out exactly how. > > Thank you > Alex You mean like this?: SELECT DISTINCT ON (uid) uid, female, given, photo, place FROM words_social JOIN games ON uid IN (player1, player2) ORDER BY uid, stamp DESC -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] update records to have occurance number
> On 25 Oct 2016, at 17:06, Mark Lybarger wrote: > > I want to update a table to have the value of the occurrence number. For > instance, I have the below table. I want to update the number column to > increment the count of last name occurrences, so that it looks like this: > > first last1 > secondlast2 > third last3 > first other 1 > next other2 > > Here's my simple table: > > create table person ( fname text, lname text, number integer); > > insert into person (fname, lname) values ('first', 'last'); > insert into person (fname, lname) values ('second', 'last'); > insert into person (fname, lname) values ('third', 'last'); > > insert into person (fname, lname) values ('first', 'other'); > insert into person (fname, lname) values ('next', 'other'); > > How would I issue an update statement to update the number column? That depends on which order you want the database to perceive those rows in. The above example suggests that alphabetical order on fname might work, in which case: update person set number = count(p2.fname) +1 from person p2 where p2.lname = person.lname and p2.fname < person.fname; Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] psql 9.3 automatic recovery in progress
> On 10 Oct 2016, at 21:43, Periko Support wrote: > > For the life time in odoo session, can u point me where I can manage that > setting? > > The configuration /etc/openerp-server.conf doesn't have any parameter for > that. > > That must be in a odoo file...? > > Thanks. > > On Mon, Oct 10, 2016 at 12:25 PM, Pavel Stehule > wrote: > > > 2016-10-10 21:12 GMT+02:00 Periko Support : > Andreo u got a good observation here. > > I got a script that run every hour why? > > Odoo got some issues with IDLE connections, if we don't check our current > psql connections after a while the system eat all connections and a lot of > them are IDLE and stop answering users, we create a script that runs every > hour, this is: That's all part of Odoo (formerly known as OpenERP), isn't it? Did you contact them about this behaviour yet? Might just be that they're familiar with the problem and have a solution for it. I suspect the Python script you're running was implemented as a rather rough workaround by people from allianzgrp who knew just enough to be harmful. (Kill -9 on a database process, jeez! Keyboards should have an electroshock feature for people like that…) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] psql 9.3 automatic recovery in progress
> On 10 Oct 2016, at 21:28, Alban Hertroys wrote: > > >> On 10 Oct 2016, at 21:12, Periko Support wrote: >> >>for pid in idle_record: >>try: >> #print "process details",pid >> #os.system("kill -9 %s" % (int(pid[0]), )) >>os.kill(int(pid[0]), signal.SIGKILL) >>except OSError as ex: >>continue > > That query returns PostgreSQL backends and you're sending them SIGKILL. Not a > recommended practice far as I know. Shouldn't you rather be sending those > kill signals to the clients connecting to the db? > Worse, apparently at some time in the past (a month ago matching those logs, > perhaps?) it used to send kill -9! That's absolutely a very bad idea. > > While on the topic, there is a PG function to cancel a backend query from > within PG: https://www.postgresql.org/docs/9.5/static/functions-admin.html > I think that's the best way to go about this, and best of all, you can > combine that with your select statement. Another idea struck me; if that script is under version control, you can check when that change was committed. If it isn't, perhaps you should. My current favourite is Hg (aka Mercurial), which happens to be written in Python, just like your script. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] psql 9.3 automatic recovery in progress
> On 10 Oct 2016, at 21:12, Periko Support wrote: > > for pid in idle_record: > try: > #print "process details",pid > #os.system("kill -9 %s" % (int(pid[0]), )) > os.kill(int(pid[0]), signal.SIGKILL) > except OSError as ex: > continue That query returns PostgreSQL backends and you're sending them SIGKILL. Not a recommended practice far as I know. Shouldn't you rather be sending those kill signals to the clients connecting to the db? Worse, apparently at some time in the past (a month ago matching those logs, perhaps?) it used to send kill -9! That's absolutely a very bad idea. While on the topic, there is a PG function to cancel a backend query from within PG: https://www.postgresql.org/docs/9.5/static/functions-admin.html I think that's the best way to go about this, and best of all, you can combine that with your select statement. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to retrieve jsonb column through JDBC
> On 29 Aug 2016, at 20:23, Alexander Farber wrote: > On Mon, Aug 29, 2016 at 7:50 PM, Merlin Moncure wrote: > On Sat, Aug 27, 2016 at 5:39 AM, Alexander Farber > wrote: > > > > List last_tiles = (List) JSON.parse(rs.getString("last_tiles")); > > > > has not work for me even though the string is: > > > > [{"col": 7, "row": 8, "value": 1, "letter": "A"}, {"col": 7, "row": 7, > > "value": 2, "letter": "B"}, {"col": 7, "row": 9, "value": 2, "letter": "C"}] > > > > but it is probably the problem of the Jetty class I am using and not of > > JDBC... > > huh. what exactly is failing? are you getting a parse exception? > > http://download.eclipse.org/jetty/9.3.11.v20160721/apidocs/org/eclipse/jetty/util/ajax/JSON.html#parse-java.lang.String- > > fails with: > > java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to > java.util.List I'm not 100% sure it's the root of the ClassCastException here, but I'm pretty sure that Java will want to know what class of items that List consists of. If that doesn't ring a bell for you, spend some time reading about "Java generic classes" (not to be confused with "general Java classes"). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] select array_remove(ARRAY[NULL,NULL,NULL],NULL); returns {} instead of {NULL,NULL,NULL}
> On 08 Aug 2016, at 20:19, Tom Lane wrote: > > Alexander Farber writes: >> I wonder, why the following returns NULL and not 0 in 9.5.3? > >> # select array_length(array_remove(ARRAY[NULL,NULL,NULL],NULL), 1); > > Because the result of the array_remove is an empty array, which is > defined to be zero-dimensional in PG. Reading this, I'm a bit confused about why: select array_remove(ARRAY[NULL, NULL, NULL], NULL); Results in: array_remove -- {} (1 row) How does it now which unknown value to remove from that array of unknown values? Shouldn't the result be: {NULL,NULL,NULL}? (Sorry for sort-of hijacking this thread) Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: [GENERAL] RE: [GENERAL] Re: [GENERAL] A simple extension immitating pg_notify
> On 26 Jul 2016, at 2:52, Mehran Ziadloo wrote: > > Sorry if my terminology is not accurate. But by an instance, I mean a > PostgreSQL > installation. And I call it an instance (and not a database) not to confuse it > with the concept of databases (as in databases / schemas). Even when I'm > trying > to clarify the terminology, it's hard due to lack of distinguishable words! > > And here, I'm not talking about the cluster version of PostgreSQL. Simple, old > fashion PostgreSQL will do. Nobody here is talking about some clustered version of PG. What you call an 'instance' (or 'installation') is called a cluster. A cluster contains databases, which contain schema's etc. In database terms, a cluster is a single database server, a single 'installation' in your terms. If you would install multiple PG servers in separate directories, running on separate port numbers, you would have multiple clusters. Same if you distribute those servers over several hosts, what you seem to think a cluster means. That is the difference between a cluster of databases and a cluster of servers. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Help needed structuring Postgresql correlation query
> On 19 Jun 2016, at 10:58, Tim Smith wrote: > > Hi, > > My postgresql-fu is not good enough to write a query to achieve this > (some may well say r is a better suited tool to achieve this !). > > I need to calculate what I would call a correlation window on a time > series of data, my table looks like this : > > create table data(data_date date,data_measurement numeric); > insert into data values('2016-01-01',16.23); > > insert into data values('2016-06-19',30.54); > > My "target sample" would be the N most recent samples in the table > (e.g. 20, the most recent 20 days) > > My "potential sample" would be a moving window of size N (the same > size N as above), starting at T0 (i.e. 2016-01-01 in this example) and > incrementing by one (i.e. 2016-01-01 + 20, then 2016-01-02+20 etc), > but the "target sample" would obviously be excluded. > > The output needs to display window date range (or at least the start > date of the "potential sample" window) and the result > corr(target,potential). > > Hope that makes sense Something like this could do the trick (untested): with recursive sample (nr, start_date) as ( select 1 as nr, data_date as start_date, SUM(data_measurement) as total from generate_series(0, 19) range(step) left join data on (data_date = start_date + range.step) union all select nr + 1, sample.start_date +1, SUM(data_measurement) as total from sample join generate_series(0, 19) range(step) left join data on (data_date = start_date +1 + range.step) where start_date +1 +19 <= (select MAX(data_date) from data) group by 1, 2 ) select * from sample where start_date >= '2016-01-01'; Not sure how best to go about parameterising sample size N, a stored function seems like a good option. Another approach would be to move a (cumulative) window-function with 20 items over your data set and for each row subtract the first value of the previous window from the total of the current window (that is, assuming you're calculating a SUM of data_measurement for each window of 20 records). Visually that looks something like this for sample size 4: sample 1: (A + B + C + D) sample 2: (A + B + C + D) + E - A = (B + C + D + E) sample 3: (B + C + D + E) + F - B = (C + D + E + F) etc. To accomplish this, you calculate two cumulative totals (often misnamed as running totals, but AFAIK that's something different), one from the start, and one lagging N rows behind (you can use the lag() window function for that) and subtract the two. Good luck! Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Sequences, txids, and serial order of transactions
> On 12 Jun 2016, at 4:03, Christian Ohler wrote: > we have a use case similar to auditing packages like pgMemento or Audit > Trigger 91plus – we are looking to keep an ordered history of certain write > transactions. I'm trying to understand the trade-offs between different ways > of getting that order, i.e., assigning numbers to transactions (ideally > strictly monotonic, modulo concurrency). All of our transactions are > serializable (for now). > (2) the orders produced by txid_current and a sequence can be different > (unsurprisingly). (If it was desirable to make them match, we could probably > do so by briefly holding a lock while we call both txid_current and nextval – > seems like this shouldn't limit concurrency too much. Or would it? Is one > of them potentially slow?) I'm aware of only 2 cases that those can have a different order: 1. The txid or the sequence wraps 2. The txid of a transaction exists some time already when the sequence's nextval() gets called. A later transaction (higher txid) running in parallel could request a nextval() in between those moments. I think that situation 1 can be caught (the few times it occurs). Situation 2 is probably what bothers you? As long as the request for nextval() is early in the transaction, a wait-lock shouldn't block other waiting transactions for long. To make sure, I would run some tests comparing running enough parallel transactions calling a sequence's nextval() both with and without the lock. The first of those will also give you some insight in how bad the transaction ordering vs. sequence ordering problem actually is. That is, unless you're perhaps overcomplicating your problem (see my answer to (6)). > (5) Postgres can give us a "high watermark" ("no transactions with IDs below > this number are still in-flight") for txid_current (using > txid_snapshot_xmin(txid_current_snapshot())), but has no equivalent feature > for sequences How would it know whether a sequence number is still in use? For example, I have a process @work where I use a database sequence to distinguish between batches of data in a user's HTTP session. Nothing of that is in the database, but the sequence is most certainly in use, across different database sessions. > (6) neither txid_current nor a sequence give us a valid serial order of the > transactions That depends on what you consider a transaction for your application. Do you care about the order that data got manipulated in, or do you care in what order the surrounding database transactions were created? Usually, people only care about the first, for which a sequence should be just fine. The second is usually only relevant for systems that are closely tied to the database internals, such as replication systems. > (7) given that we can't get a valid serial order, what guarantees can we get > from the ordering? I'm not entirely sure what to look for, but at a minimum, > it seems like we want writes that clobber each other to be correctly ordered. > Are they, for both txid_current and for sequences? My guess was "yes" for > txids (seems intuitive but just a guess) and "no" for sequences (because > https://www.postgresql.org/docs/current/static/functions-sequence.html > mentions that sequences are non-transactional); but for sequences, I couldn't > immediately construct a counterexample and am wondering whether that's by > design. Specifically, it seems that Postgres acquires the snapshot for the > transaction (if it hasn't already) when I call nextval(), and as long as the > snapshot is acquired before the sequence is incremented, I suspect that this > guarantees ordering writes. Does it? As I understand it, sequences have to be non-transactional to be able to guarantee correct ordering. Calling nextval() will increment the sequence, but does not relate it to the transaction at that point. The select statement that does the call to nextval() receives the value from the sequence and is part of the transaction. That links them together, as long as you don't use that sequence value outside that transaction. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full text search index
On 26 May 2016 at 06:04, Patrick Baker wrote: > Hi there, > > I've got the following query: >> >> >> SELECT COUNT(DISTINCT j0_.id) AS sclr10 >> FROM customers j0_ >> WHERE ((LOWER(j0_.name_first) LIKE '%some%' >> OR LOWER(j0_.name_last) LIKE '%some%') >>AND j0_.id = 5) >> AND j0_.id = 5 > > > The query is taking ages to run. Your guess is as good as ours without knowing what query plan the database decided on. Post the output of explain analyze. > I read about wildcards and it seems I have to use a function with > to_tsvector ? In general, you have to use an expression of which the query planner can see that it's equivalent to the expression used in the index. Otherwise the query planner has no way of knowing whether the index is suitable for the query and it won't use the index. >> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid", >> ("full_text_universal_cast"("name_first"::"text")), >> ("full_text_universal_cast"("name_last"::"text"))); In your case, you should query on full_text_universal_cast(your_field) instead of on like '%some%'. Alternatively, if your query always uses the sanme wildcard expression you could create indexes on your_field like '%some%'. > full_text_universal_cast: >> >> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data >> "text") >> RETURNS "tsvector" AS >> $BODY$ >> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), '')); >> $BODY$ >> LANGUAGE sql IMMUTABLE >> COST 1000; The query planner has no way of knowing what this function does internally, so it certainly won't match the function results in the index up with your like expression. Regards, Alban Hertroys -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Function PostgreSQL 9.2
> On 05 May 2016, at 8:42, drum.lu...@gmail.com wrote: > The final function code is: > > CREATE OR REPLACE FUNCTION users_code_seq() >RETURNS "trigger" AS $$ > DECLARE code character varying; > BEGIN > IF NEW.code IS NULL THEN > SELECT client_code_increment INTO STRICT NEW.code FROM > public.companies WHERE id = NEW.id ORDER BY client_code_increment DESC; ^^^ There's your problem. I'm pretty sure the keyword STRICT isn't valid there. It probably gets interpreted as a column name. > END IF; > IF (TG_OP = 'INSERT') THEN > UPDATE public.companies SET client_code_increment = > (client_code_increment + 1) WHERE id = NEW.id; > END IF; > RETURN NEW; > END; > $$ LANGUAGE plpgsql; Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Thoughts on "Love Your Database"
On 4 May 2016 at 17:08, John McKown wrote: > I had a manager, long ago, who used a Lotus 1-2-3 spreadsheet to > contain all his memos. I was glassy eyed in disbelief. He also would use his > hand calculator to add up the numbers in the spreadsheet to be sure that the > summation function in the spreadsheet didn't make a mistake. That still happens - we have a few live examples around in this company. Unfortunately they're not caged and there's no sign "Do not feed the managers". Admittedly, they're using Excel instead of Lotus-1-2-3, but that's the only difference. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] (VERY) Slow Query - PostgreSQL 9.2
> On 03 May 2016, at 11:55, drum.lu...@gmail.com wrote: > > Hi all, > > I'm trying to get the query below a better performance.. but just don't know > what else I can do... > > Please, have a look and let me know if you can help somehow.. also.. if you > need some extra data jet ask me please. > > * Note that the gorfs.inode_segments table is 1.7TB size > > I have the following Query: > > explain analyze > > SELECT split_part(full_path, '/', 4)::INT AS account_id, >split_part(full_path, '/', 6)::INT AS note_id, >split_part(full_path, '/', 9)::TEXT AS variation, >st_size, >segment_index, >reverse(split_part(reverse(full_path), '/', 1)) as file_name, >i.st_ino, >full_path, >(i.st_size / 100::FLOAT)::NUMERIC(5,2) || 'MB' AS size_mb > FROM gorfs.inodes i > JOIN gorfs.inode_segments s > ON i.st_ino = s.st_ino_target > WHERE i.checksum_md5 IS NOT NULL > AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+' > AND i.st_size > 0; (Stripped the 1-and-a-half extra queries in there, but that incomplete one might be why you're waiting?) > • Explain analyze link: http://explain.depesz.com/s/Oc6 > The query is taking ages, and I can't get the problem solved. > > These are the index I've already created on the inode_segments table: > What else can I do to improve the Performance of the Query? The first thing I notice in your query is that you're making use of hierarchically organised data without storing it hierarchically, namely that full_path field. The result of that is that both your table and your index contain a lot of redundant information. Now I'm not so sure a hierarchical table + query are going to help get you much performance out of this (probably worth an experiment or two, mind that O/S's usually use inode trees for such things), but reducing the redundancy in the index would probably help: create index gorfs.inodes_accounts_idx on gorfs.inodes (substring (full_path from 20)) where full_path like '/userfiles/account/%'; and then use similar expressions in your query of course: where full_path like '/userfiles/account/%' and substring(full_path from 20) ~ '^[0-9]+/[a-z]+/[0-9]+'; Good luck! Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow join over three tables
On 28 April 2016 at 08:36, Tim van der Linden wrote: > On Wed, 27 Apr 2016 13:48:06 +0200 > Alban Hertroys wrote: > >> In this case, you're using the values in adverse to filter relevant rid's >> for the FK join, so you might be better off with the inverse of above index: >> create index on report_adverses (adverse, rid); >> create index on report_drugs (drug, rid); > > H, like I reported yesterday this achieved a huge performance boost. > > However, two (new) things I like to touch on while further experimenting with > this query: > > 1. Altering Drug IDs or Adverse names effects the executing time negatively. > > In this example altering the drug IDs I search for makes the query 6 times > slower again: > > SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug > FROM reports r > JOIN report_drugs d ON d.rid = r.id > JOIN report_adverses a ON a.rid = r.id > WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back > pain', 'back pain']) > AND d.drug = ANY (ARRAY[9557, 17848, 17880, 18223]) ORDER BY r.created; > > Different drug ID ([9557, 17848, 17880, 18223]), but the rest is the same. > Query plan: > > Sort (cost=31409.71..31417.48 rows=3107 width=41) (actual > time=662.707..662.819 rows=4076 loops=1) >Sort Key: r.created >Sort Method: quicksort Memory: 415kB >-> Nested Loop (cost=25693.17..31229.48 rows=3107 width=41) (actual > time=71.748..661.743 rows=4076 loops=1) > -> Merge Join (cost=25692.61..26795.53 rows=3107 width=29) (actual > time=70.841..97.094 rows=4076 loops=1) >Merge Cond: (d.rid = a.rid) >-> Sort (cost=16332.08..16671.61 rows=135810 width=8) > (actual time=48.946..58.623 rows=135370 loops=1) > Sort Key: d.rid > Sort Method: quicksort Memory: 12498kB > -> Index Scan using report_drugs_drug_idx on > report_drugs d (cost=0.44..4753.44 rows=135810 width=8) (actual > time=0.681..28.441 rows=135532 loops=1) >Index Cond: (drug = ANY > ('{9557,17848,17880,18223}'::integer[])) You're doing ~9 times as many index lookups. A slowdown of 6x of this part of the query seems rather reasonable. > Planning time: 16.438 ms > Execution time: 663.583 ms > > A total of 660 ms instead of the previous 120 ms. The amount of rows returned > are 4076, instead of the 448 rows of the previous query. Could this be the > cause of the slow down? Or is it the fact that the integers in the ANY clause > are further apart from one another and thus make for longer index searches? Technically probably both, but the index node distance (assuming a btree index) should be insignificant. The sorting of found candidate rows on rid probably contributes the most. I forgot how quicksort scales to the number of items to sort, probably something like 2log(n) or something like that. That said, you didn't happen to perform these tests (assuming >1) during a busier period of the database server? I ask because later on you seem to be getting fast results again after some more index changes and I don't see cause for the difference. > The same happens with changing the adverses, some adverses come back in ~120 > ms, others can take up to one second. Likely the same issue > 2. Adding extra columns ignores indexes > > This one was kind of expected. Adding extra columns from both "report_drugs" > and "report_adverses" slows the lot down again. Extra columns added: > > - a.recovery > - a.severity > - d.reason > - d.effectiveness > - d.duration I have to admit that I'm a bit behind with the current state of the art of PostgreSQL, but last time I checked, the database needed to look at each row in the result-set for transaction visibility information. I recall that there was (at some point) much discussion whether that visibility information could be added to indexes and that there were strong arguments against doing so. Hence, I doubt that those new index-only scans skip that step. Unless I'm wrong there, adding non-queried fields to the index is only going to affect your performance adversely. Relevant fields for indices are those used in joins, those regularly used in conditions (where-clauses) and those that are regularly sorted over. Other fields are available from the candidate rows of the result set. Putting them in the index just results in storing redundant information. That's how it always has been and I'm pretty sure that's still the case. > As I would expect, adding these columns would make the previous multi-column > indexes useless. And they indeed were not used anymor
Re: [GENERAL] Slow join over three tables
> On 27 Apr 2016, at 4:09, David Rowley wrote: > > On 27 April 2016 at 11:27, Tim van der Linden wrote: >> The query: >> >> SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug >> FROM reports r >> JOIN report_drugs d ON d.rid = r.id >> JOIN report_adverses a ON a.rid = r.id >> WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back >> pain', 'back pain']) >> AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363]) ORDER BY r.created; >> > Likely the only way to speed this up would be to create indexes; > > create index on reports (id, age, gender, created); Since you're not using age and gender in this (particular) query until the rows are combined into a result set already, it doesn't make a whole lot of sense to add them to the index. Moreover, since your'e ordering by created, I'd at least put that directly after id: create index on reports (id, created); > create index on report_adverses (rid, adverse); In this case, you're using the values in adverse to filter relevant rid's for the FK join, so you might be better off with the inverse of above index: create index on report_adverses (adverse, rid); create index on report_drugs (drug, rid); Do these tables have a primary key and can that be used here? I hope I'm not being superfluous here, but don't forget to vacuum analyze after creating those indexes. Eventually autovacuum will do that for you, but you could otherwise be running tests to verify the impact of adding those indexes before autovacuum gets around to it. Finally, perhaps it's more efficient to weed out all unrelated drugs and adverses before relating them to reports, something like: SELECT r.id, r.age, r.gender, r.created, x.adverse, x.drug FROM reports r JOIN ( SELECT d.rid, a.adverse, d.drug FROM report_drugs d JOIN report_adverses a ON a.rid = d.rid WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back pain', 'back pain']) AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363]) ) x ON x.rid = r.id ORDER BY r.created; Looking at the cardinality of your tables that does seem a bit unlikely though. Still, worth a shot... Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is the comparison between timestamp and date so much slower then between two dates
> On 14 Apr 2016, at 15:12, Thomas Kellerer wrote: > > now() (and current_timestamp as well) are defined to return the > same value throughout the entire transaction. > > So the optimizer _should_ be smart enough to do the conversion only > once at the beginning of the statement and then use that converted > value during the execution of the statement without the need > to re-evaluate it for each row. As I understand it, that's not how it works. If the optimizer would down-convert the value of now() from a timestamp to a date, it would lose precision, possibly resulting in wrong results for corner cases in general. For that reason, it chooses to do the opposite and up-converts the dates. But, because the dates are fields and not constants, it has to do so for every row. If that's indeed what happens, then indeed, now() gets evaluated only once, but the slow-down is caused by having to do conversions (for two field values) for every row. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why is the comparison between timestamp and date so much slower then between two dates
On 13 April 2016 at 15:45, Tom Lane wrote: > Thomas Kellerer writes: >> So my question is: why is comparing a timestamp to a date so much slower? > > The date has to be up-converted to a timestamptz (not timestamp). > I think the expensive part of that is determining what timezone > applies, in particular whether DST is active. You could try it > with "localtimestamp" (no parens) instead of "now()" to see how > it performs with a non-tz timestamp. > > regards, tom lane Or... you can manually down-convert the timestamptz now() to a date ;) That's basically what you're doing when you use current_date instead of now(). The reason that the other way around is so much more expensive is that the database needs to do that conversion twice for every row in the table. When down-converting now(), the DB only needs to do that once for all rows. Regards, Alban Hertroys. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Really unique session ID - PID + connection timestamp?
o it can repeats after server > restarts, but my coll. found a "session creation timestamp". > This is a key which unique. > With this we can check for died sessions and we can clean their records. > > It might help to explain more what it is you are trying to achieve. > > First I am not sure what you mean by 'persistent lock', especially as it > applies to Postgres? > > Second, I assume by garbaging you mean garbage collection of something? > If that is the case what exactly are you garbage collecting? > I see 'clean records', what records would these be? > In particular, on Postgres, where are you going to do this? > > > We want create same mechanism. > > If the above questions did not already cover this, what mechanism? > > > I know there are adv. locks in PG, but I want to use session id. > > This could be: > |pg_backend_pid|() > > May pid repeats. > Where I can get timestamp or some other unique data with I can create a > combined primary key? > > Thanks for your help! > > dd > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_restore real file size
> On 26 Feb 2016, at 5:30, John R Pierce wrote: > > On 2/25/2016 8:26 PM, drum.lu...@gmail.com wrote: >> >> I'm doing the pg_restore now in a 1.5TB file: >> >> # ls -la >> postgres postgres 1575324616939 Feb 20 13:55 devdb_0.sql >> >> But, the restore has gone over 1.6 TB >> > > the dump file does not contain the indexes, just CREATE INDEX statements Or page alignment padding, or the effects of fill factors, to name a few more reasons. Besides, your dump-file is compressed. It's being restored using pg_restore (or can it restore using plain text sql scripts these days?) instead of psql. Lastly, how does a database in unicode fare for text size (without toasted values) against a plain text dump file in, say, utf-8 - which isn't even the worst case scenario? That's the simplistic case; in reality some values will get toasted and the dump file is compressed. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does query planner choose slower BitmapAnd ?
> On 22 Feb 2016, at 16:58, Tom Lane wrote: > > (BTW, is that index really on just a boolean column? It seems > unlikely that "phoneable" would be a sufficiently selective > condition to justify having an index on it. I'd seriously consider > dropping that index as another solution approach.) On that train of thought, I would think that a person or company would only be phoneable if they have a phone number registered somewhere. That somewhere probably being in another table that's too far away from the current table to check it straight away - so this is an optimisation, right? Where I see that going is as follows: A "contact" either has a phone number - in which case you'd probably rather get that phone number - or they don't, in which case a null value is often sufficient[1]. While a phone number certainly takes up more storage than a boolean, it wouldn't require an index (because it's available right there) nor the extra joins to look up the actual phone number. And if you'd still want to put an index on it, the null values won't be indexed, which takes a bit off the burden of the larger field size. You _could_ also take a shortcut and use a variation of your current approach by storing null instead of false for phoneable, but then your index would contain nothing but true values which rather defeats the point of having an index. Query-wise, I suspect that the number of "contacts" that have a phone number far outweighs the number that doesn't, in which case it's more efficient to query for those that don't have one (fewer index hits) and eliminate those from the results than the other way around. In my experience, both the NOT EXISTS and the LEFT JOIN + WHERE phoneable IS NULL tend to perform better. A final variation on the above would be to have a conditional index on your PK for those "contacts" that are NOT phoneable. That's probably the shortest and quickest list to query. I'd still prefer that field to contain something a bit more meaningful though... Well, enough of my rambling! Ad 1. It is possible that you cater for the possibility that you don't know whether a "contact" has a phone number or not, in which case null would probably be the wrong choice for "no phone number" because then you wouldn't be able to distinguish between "no phone number" and "I don't know". Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimize Query
> On 14 Feb 2016, at 20:40, drum.lu...@gmail.com wrote: > > Hi Alban! Sorry.. that was my mistake Okay, first advice on that query: Trim it down to something that people can wrap their minds around. You have a silly amount of code repetition in there, much of which doesn't even seem to serve a purpose. This is some kind of generated query, I gather? For example, you don't use most of the fields from your first subquery. Another example is the nested subquery in your left join, which can be reduced to a single subquery with just the fields summed that you actually need (and the customer_id, obviously). The same goes for most of the joins inside that left join, definitely the left joins - but that depends a bit on your table definitions and contents. For example, the fields you're summing come from account (but you can use customer instead, since you only use the account_id, which equals client_id anyway) and bill_item. Some fields in your where-clause come from job, some others I can't tell where they're from. Trim, trim, trim, until you're left with a more readable query that gives you the same results and then put it through explain analyze again. It wouldn't surprise me if that query is already significantly faster. If you're still having problems at that point, post that query and the analysis again. > Explain analyze link: > http://explain.depesz.com/s/5WJy Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimize Query
> On 13 Feb 2016, at 11:21, drum.lu...@gmail.com wrote: > > Anyone can help with that please? > > Cheers What would help is: 1. to post an actual query that you need optimised and 2. an explain analyze of that query. What you posted in your original message was some kind of query-template with enough placeholders and views that there is no way to predict how that's going to perform without at least knowing what goes into the placeholders and how the views are built up. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Test CMake build
> On 10 Feb 2016, at 19:09, Yury Zhuravlev wrote: > > Hello all. > Please test build Postgres using cmake. If you are of course interested. > Still not everything is ready but most of the work. Assembly instructions as > does the repository is on github: > https://github.com/stalkerg/postgres_cmake > > The compilation will be enough (tests even better). I need feedbacks so that > create issues on github. > Very interesting NetBSD, OpenBSD, Solaris. I was under the impression that the FreeBSD port already uses cmake? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] execute same query only one time?
> On 08 Feb 2016, at 20:05, Johannes wrote: > > select id, col1, col2, ... from t0 where id = (select max(id) from t0 > where col1 = value1 and col2 = value2 and …); > select col1 from t1 where t0_id = (select max(id) from t0 where col1 = > value1 and col2 = value2 and …); select t0.id, t0.col1, t0.col2, t0…., t1.col1 from t0 join t1 on (t1.t0_id = t0.id) group by t0.id, t0.col1, t0.col2, t0…., t1.col1 having t0.id = max(t0.id); Low complexity and works with any number of rows from t0 (as does Adrian's solution, btw). I'm not sure what you mean by "copying of columns" in your reply to Adrian's solution, but I don't think that happens here. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Let's Do the CoC Right
> On 24 Jan 2016, at 1:48, Regina Obe wrote: > So the point is stop assuming who has experience and who doesn't simply by > how people look. +1 To expand on that: Don't let your prejudices get the better of you. Assuming that other people are prejudiced about you is just another prejudice. The people you're communicating with on a mailing list are individuals, not groups, so do not generalise. In most cases you do not know enough about them to make such assumptions, and that's assuming they are valid, which they generally are not (yikes! a generalisation!). > The irony is that PostgreSQL is so diverse that a Coc leads to nothing but > huge arguments about what's okay in one culture and not another. We can only > go by intent and not what was done. If we can't say that, then a Coc does > us no good. I've already violated all Cocs at least 3 times in this whole > conversation. > > Thanks, > Regina What about this for a CoC?: 1. Do not discuss a CoC. But, this side-thread has been going on long enough I think. Let's wrap this up, shall we? Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Code of Conduct: Is it time?
On 12 January 2016 at 09:25, Chris Travers wrote: > One of the dangers of a CoC is that there are many potential issues which > may or may not become real problems. I think if we try to be clear on all > of them, then we risk creating codes instead of a general expectation of > what we do expect. Another consideration. Last night I was thinking this issue over and then remembered that normally very reasonable persons (which I count myself among) can react quite poisonous when they are tired or stressed and people start pushing their buttons. Those people probably would not be violating any CoC rules, but can cause someone else to do so. Moreover, some people are exceptionally good at pushing all the wrong buttons, whether doing that willingly (out of malice) or not. I'm a bit concerned that a CoC could give the malicious among those the ammunition they need to push buttons of their victims. Now of course, they could do that just as well without a CoC and I don't recall any instances of this problem on this list. To add to that, non-native speakers sometimes make mistakes that set it off. I remember an embarrassing case where I thought the word "gross" came from the German "Grosshaft", which means quite the opposite (great, fabulous), and responded to a new idea on a list with a heartily meant "Gross!". And then you suddenly get angry mails from all over the place without understanding how that happened. Oops. Where I stand? I do not know whether a CoC for PG is a good idea or not, I can't decide. Anyway, in my case it's nothing more than an opinion anyway - my contributions are pretty much limited to offering help on this ML. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] WIP: CoC V2
> On 12 Jan 2016, at 0:16, Joshua D. Drake wrote: > > 3. A safe, respectful, productive and collaborative environment is free of > non-technical or personal comments, for example ones related to gender, > sexual orientation, disability, physical appearance, body size, race or > personal attacks. I'm not debating whether there should be examples or not, they are usually useful, but perhaps examples belong in a separate section and not in the core CoC? Frankly though, this thread looks like a testament of why Postgres doesn't really need a CoC. You people are all being so polite about it that it's almost offensive! Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Definitive answer: can functions use indexes?
> On 07 Jan 2016, at 5:19, Jim Nasby wrote: > > On 1/6/16 5:41 PM, Tom Lane wrote: >> Since the question makes little sense as stated, I'm going to assume >> you mean "can a query like SELECT ... WHERE left(foo, 3) = 'bar' >> use an index on column foo?" >> >> The answer to that is no, there is no such optimization built into >> Postgres. (In principle there could be, but I've not heard enough >> requests to make me think we'd ever pursue it.) > > BTW, the case where this would be highly valuable is timestamps. Being able > to do something like date_part('month',timestamptz)='Jan' would be a big, big > deal for warehousing. Not just warehousing, for BI in general. But, as is now quite clear, for many of those cases it should be fairly trivial to work around this limitation by creating either a functional index or an operator. For the above example, say something like timestamptz % 'month' = 'Jan'. There are downsides to that approach though, such as readability and that this way of using % instead of date_part() is not according to any standard behaviour and could even behave differently or (more likely) not work at all on other PG instances. That said, it's not uncommon in BI to require a seq. scan anyway, in which case the point is rather moot. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] A unique pairs version of UNNEST() ?
> On 04 Jan 2016, at 21:08, Wells Oliver wrote: > > Hey all, happy new year. > > I am trying to get unique pairs from an array of N numbered items, usually 5, > but possibly 4 or 6. > > If I just wanted unique values, I could do SELECT UNNEST(list_of_ids) AS id, > COUNT(*) FROM table GROUP BY id but in this situation I want all unique pairs > and a COUNT. > > For those familiar with python, this is the functionality found in > itertools.combinations. I'm leaning towards just doing this in python, but I > really like keeping as much in SQL as possible. > > So in an example where list_of_ids is {1,2,3,4,5} I would essentially get: > > {1, 2} > {1, 3} > {1, 4} > {1, 5} > {2, 3} > {2, 4} > {2, 5} > {3, 4} > {3, 5} > {4, 5} Your example reminds me of combinatory theory. The result you seem to be looking for is the list of possible unique combinations, as sets of elements of the total set (sets are orderless). with list_of_ids as ( select unnest(list_of_ids) as id from table ) select a.id, b.id from list_of_ids a, list_of_ids b where b.id > a.id; Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] to_timestamp alternatives
> On 01 Jan 2016, at 0:46, Jim Nasby wrote: > > BTW, my recommendation would be to store in a timestamptz field *with the > correct timezone*, and then convert on output as necessary. This is easy to > do by either > > SET timezone > > or > > SELECT timestamptz_field AT TIME ZONE '…'; This. When converting the original timestamps to UTC, you lose data. In my experience, you will end up needing that data sooner or later. Remember, the database stores timestamps in UTC internally anyway, you don't need to the conversion yourself. > Since you're dealing with GPS data and presumably have lat/long, it shouldn't > be hard to do this dynamically either, either by just blindly dividing > longitude by 15 or using actual timezone shape polygons and @> or <@. That would be a bad idea for global data, since not all time zones are full hours apart, or have the same (if any) DST change-over dates. For example, India is currently at UTC+05:30, probably because they wrapped the entire country in the same TZ after their independence. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Recurring and non recurring events.
> On 26 Dec 2015, at 13:03, Kevin Waterson wrote: > > Thanks, as I am new to postgres, I was unaware of this function. Actually, the article you referenced makes use of generate_series as well (at INSERT INTO events), but then for some reason decides to create a generate_recurrences function later on. Possibly the choice came from them using a domain (RECURRENCE) that did not translate directly (although almost) to an interval. > To go with this, I guess I will need a table with which to store intervals, > start and end dates? > > eg > CREATE table events( > id serial primary key, > start_timestamp timestamp, > end_timestamp timestamp, > interval > > with dateRange as > ( > SELECT min(start_timestamp) as first_date, max(start_timestamp) as last_date > FROM events > ) > select > generate_series(first_date, last_date, '1 hour'::interval)::timestamp as > date_hour > from dateRange; But, instead of generate_series you could also use a recursive CTE (which is more or less standard SQL - implementations differ slightly between databases): with recursive dateRange (curr_stamp, max_stamp, step) as ( select min(start_timestamp), max(start_timestamp), interval '1 week' from events union all select curr_stamp + step, max_stamp, step from dateRange where curr_stamp + step <= max_stamp ) select curr_stamp from dateRange; I suspect generate_series is faster, but since your query already almost looked like this I thought I'd offer this alternative approach. It has a little bit more flexibility too, as you can add fields and calculations to the CTE quite easily. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Transfer db from one port to another
> On 23 Dec 2015, at 12:43, Killian Driscoll wrote: > > OK - I did the dir and it shows that the dump and restore.exe are there, but > running the above gives the below errors > > 09/06/2014 08:35 381,952 pg_dump.exe > > 09/06/2014 08:35 180,224 pg_restore.exe > > C:\Users\killian>path "C:\Program Files\PostgreSQL\9.3\bin";%path% > > C:\Users\killian>pg_dump -Fc -p 5432 -U postgres irll_project | pg_restore -U > po > stgres -p 5532 > ccoulould not findd a n "pg_restore" to executeot find a "pg_dump" to execute Aren't you trying to move a database to PG 9.4? Then you need to use the pg_dump and pg_restore utilities of the 9.4 installation, not those of the 9.3 one. Those utilities are guaranteed to be backwards compatible, but they're not necessarily forwards compatible. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue
> Before I start in to implement a DELETE / AUTOVACUUM / VACUUM approach, to > recycling disk space used for a session management table, I would like to > propose, for consideration by this forum, an idea for a different approach. > > A row in a session management table, represents a significant "sunk cost" in > both computing time used to create the row, and in disk space allocated. > Postgres has to use a lot of resources to create that row in the first > place. > > When the session that originally caused that row to be allocated, eventually > expires -- why delete the associated session managent row ?? > > Instead of using a DELETE command to destroy the row (and a > resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage > space), why not instead, simply mark that session management row as "free" > (with an UPDATE command) ?? An UPDATE is a combination of an INSERT and a DELETE command. However, rows marked as deleted will be reused at some point after autovacuum (or manual VACUUM) has made sure they are no longer in use by any DB session. So your approach can still work, as long as you vacuum that table frequently enough. The actual solution isn't the UPDATE instead of the DELETE though, but rather the partial index and an increased autovacuum frequency. Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ftell mismatch with expected position
Please refrain from top-posting. On 2 November 2015 at 10:48, Eelke Klein wrote: > Normally we call this from within our windows program where a lot of code is > involved for setting up the environment, and creating the pipes and > redirecting stdout, stderr and stdin. However I believe it is the same > problem that can be caused with the following commandline in cmd.exe but you > will need to have Coreutils for windows installed to get the cat command. > > pg_dump.exe -U postgres --format=c mydb | cat > backupfile Do you also get the warning if you split those commands up, and if so, in which one? pg_dump.exe -U postgres --format=c mydb > backupfile cat backupfile > someotherfile > The size of the backup doesn't matter. As long as there is atleast one table > in the database the warning occurs. When I use --verbose it looks like I get > one warning for every table in the database while it is dumping the data in > the tables. > > Eelke Klein > > > 2015-10-30 14:53 GMT+01:00 Adrian Klaver : >> >> On 10/29/2015 02:51 AM, Eelke Klein wrote: >>> >>> I am getting the following warning when our program runs pg_dump.exe and >>> the output is in custom format and send to standard out which is >>> connected to a pipe (Windows platform). >>> >>> pg_dump: [custom archiver] WARNING: ftell mismatch with expected >>> position -- ftell used >> >> >> Can you show us the exact command, with arguments, that you are using? >> >> Also what is the size of the dump file you are creating? >> >> >>> >>> The output of pg_dump is received by another process that copies it (and >>> some other data) into a zip file. When I extract the db backup from the >>> zip the backup is fine and I can restore it. >>> >>> I have both tried the pg_dump.exe from the 9.5 alpha and a stable 9.4 >>> both generate this warning. >>> When I change the format to plain the warning goes away. >>> When I call pg_dump from the commandline and let it send to standard out >>> and redirect it to a file the warning does not occur but it does occur >>> when I redirect to nul. >>> >>> Is there a way to prevent this warning? I would think pg_dump shouldn't >>> call ftell on a pipe. >>> >>> Regards, Eelke >> >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com > > -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Recursive Arrays 101
> On 25 Oct 2015, at 19:38, Adrian Klaver wrote: > > On 10/25/2015 11:12 AM, David Blomstrom wrote: >> I'm sorry, I don't know exactly what you mean by "definitions." The >> fields Taxon and Parent are both varchar, with a 50-character limit. >> ParentID is int(1). > > By definition I meant the schema, so from the below: > > CREATE TABLE t ( > N INT(6) default None auto_increment, > Taxon varchar(50) default NULL, > Parent varchar(25) default NULL, > NameCommon varchar(50) default NULL, > Rank smallint(2) default 0 > PRIMARY KEY (N) > ) ENGINE=MyISAM That can indeed be solved using a hierarchical query (provided you have a suitable table in PG); something akin to: WITH RECURSIVE taxons AS ( -- Hierarchical root nodes SELECT N AS id, Taxon, Rank, 1 AS level, '' || N AS Path -- A useful addition explained further down FROM t WHERE ParentID IS NULL -- Child nodes UNION ALL SELECT N AS id, Taxon, Rank, taxons.level +1 AS level, taxons.Path || ':' || N AS Path FROM taxons JOIN t ON taxons.id = t.ParentID ) SELECT id, Taxon, Rank, level FROM taxons ORDER BY Path ; The Path-bit looks complicated, but basically that just appends ID's within the same hierarchy such that, when sorted on that field, you get the hierarchy in their hierarchical order. What the hierarchy would look like if it were shown as a file hierarchy with sub-directories expanded, for example. That's pretty much the only viable alternative (alternatives vary on the column used to create the hierarchy), which is why I added it to the example. The fun thing with hierarchical queries is that you can add all kinds of extra information and make it trickle down to the child nodes, such as the items that make up the root of the hierarchy (pretty useful for grouping), for example or a field that calculates a string to prepend for indentation, etc. Or a computation that depends on values in parent items (I used this successfully in a bill of materials to calculate absolute quantities by volume, quantities by weight and cost of components in the end product where they were given relative to 1 kg of their parent, for example). It's highly flexible and powerful (and standard SQL), but it takes a bit of time to get in the right mindset. PS. I usually write my hierarchical queries in Oracle, which isn't quite as good at them as Postgres is, but it's what we have @work. Hence, I'm not sure I got the syntax 100% correct. We're working on getting PG in for a project upgrade (replacing RDB on OpenVMS, which will go EOL in <10 years!) - fingers crossed. Cheers! Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] to pg
On 25 September 2015 at 13:08, Ramesh T wrote: > CREATE UNIQUE INDEX idx_load_pick ON pick (case picked when picked='y' then > load_id else null end ); > > how can i convert case expressed to postgres..above it is oracle. Assuming that your queries are written in such a way that Oracle is indeed using that index and you want your queries to use the index as well in PG: CREATE UNIQUE INDEX idx_load_pick ON (load_id) WHERE CASE picked WHEN 'y' THEN load_id ELSE NULL END IS NOT NULL; That's definitely written a bit redundantly, that's Oracle's fault. If your queries aren't like that, it's as Ladislav wrote. Much simpler in PG! To make Oracle use your original index, your queries are probably of a form containing snippets like: SELECT * FROM foo WHERE CASE picked WHEN 'y' THEN load_id ELSE NULL END IS NOT NULL BTW, your CASE statement isn't exactly valid, even in Oracle. Your comparison is in fact this: picked = picked='y'. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Prepared Statements and Pooling
On 11 August 2015 at 06:44, Mister Junk wrote: > I'm using prepared statements to prevent SQL injection, but I have done some > reading and it seems like using Prepared statements COULD improve > performance. I understand the concept, but I do not know how to implement > this. They could, they could also reduce performance. Prepared statements store and reuse an earlier generated query plan for a query, so that's the time you safe. However, because the query planner can't make use of the actual query parameter values (they are yet unknown at that point), it can only generate a general query plan. If your data is equally distributed for your query parameters, that won't matter much, but if some values are far more common than others, that can actually hurt performance. In case of a normal statement, the query planner has enough information to decide which plan would be more efficient. With a prepared statement it has not. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question about copy from with timestamp format
> On 30 Jul 2015, at 17:59, Sherrylyn Branchaw wrote: > > From here: > > http://www.postgresql.org/docs/9.4/interactive/sql-copy.html > > "COPY can only be used with plain tables, not with views. However, you can > write COPY (SELECT * FROM viewname) TO " > > Right, so you can COPY FROM a view, but not, as far as I can tell, TO a view, > unless Alban found a workaround. Well, that's why I said it _might_ work. COPY is a bit special, I'm afraid. For starters, although it works _like_ doing a bunch of INSERTs, it doesn't perform actual INSERTs. Apparently, that also means it won't fire an INSERT rule and thus can't be used with an updatable view. There are no rules on such a view (rules rewrite the query) that would work for COPY. Now perhaps that sounds like a COPY rule is warranted for cases like these, but that doesn't help, exactly because the COPY command has no place in its syntax for expressions (such as this type conversion). INSERT does, hence we can write a rule for it… In hindsight it all makes sense. That doesn't bring you any closer to a solution, unfortunately. > On Thu, Jul 30, 2015 at 11:48 AM, Adrian Klaver > wrote: > On 07/30/2015 08:44 AM, Sherrylyn Branchaw wrote: > I was thinking that perhaps an updatable view might do the trick? > > Interesting idea! Are you able to get it to work? I keep getting 'ERROR: > cannot copy to view "view_ts_test"' even before my trigger fires. > Inserting, though, works fine. > > From here: > > http://www.postgresql.org/docs/9.4/interactive/sql-copy.html > > "COPY can only be used with plain tables, not with views. However, you can > write COPY (SELECT * FROM viewname) TO " > > > Still curious why the triggers I'm writing won't fire before my > statement errors out on copying to a view, or inserting an out-of-range > timestamp, when the trigger would resolve all the illegal operations if > it just fired first. > > > On Thu, Jul 30, 2015 at 5:57 AM, Alban Hertroys <mailto:haram...@gmail.com>> wrote: > > > > On 30 Jul 2015, at 2:27, Sherrylyn Branchaw <mailto:sbranc...@gmail.com>> wrote: > > > > Based on your PS asking about data types and commenting that you don't > want to put hour in a separate column, it sounds like this is a brand-new > table you're creating. If so, and if this is a one-time COPY operation, you > can create a text column for the initial import. Then after you're done > importing, you can execute > > > > ALTER TABLE ts_test ALTER COLUMN ts_fld TYPE TIMESTAMP USING > (to_timestamp(ts_fld, 'MMDDHH24')); > > > > to convert the format of the imported data to a timestamp. Then you're > set. > > > > If there will be ongoing imports of more files like this, though, > you'll need the intermediate table solution offered by Adrian. > > Or keep both columns and update those where the text-column is NOT > NULL and the timestamp column is NULL. > > > I was going to suggest a trigger, but it turns out that the data type > checking happens even before the BEFORE trigger fires, so you don't get a > chance to massage your data before actually inserting it. I got 'ERROR: > date/time field value out of range: "2015072913 "' before the > trigger even fired. I > wonder if that's deliberate? I was able to implement a workaround by > adding a raw_ts_fld column of type text, but an extra column might > be too ugly for you relative to a temp table, I don't know. > > I was thinking that perhaps an updatable view might do the trick? > > You would need to create a view with the timestamp column converted > to text in the format in your CSV file. Next you add an INSERT rule > that does the conversion from text to timestamp and inserts the row > in the actual table. Finally, you use the view in the COPY statement > instead of the table. > Added bonus, you can now also use the view to export your table to > the same CSV format. > > Alban Hertroys > -- > If you can't see the forest for the trees, > cut the trees and you'll find there is no forest. > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question about copy from with timestamp format
> On 30 Jul 2015, at 2:27, Sherrylyn Branchaw wrote: > > Based on your PS asking about data types and commenting that you don't want > to put hour in a separate column, it sounds like this is a brand-new table > you're creating. If so, and if this is a one-time COPY operation, you can > create a text column for the initial import. Then after you're done > importing, you can execute > > ALTER TABLE ts_test ALTER COLUMN ts_fld TYPE TIMESTAMP USING > (to_timestamp(ts_fld, 'MMDDHH24')); > > to convert the format of the imported data to a timestamp. Then you're set. > > If there will be ongoing imports of more files like this, though, you'll need > the intermediate table solution offered by Adrian. Or keep both columns and update those where the text-column is NOT NULL and the timestamp column is NULL. > I was going to suggest a trigger, but it turns out that the data type > checking happens even before the BEFORE trigger fires, so you don't get a > chance to massage your data before actually inserting it. I got 'ERROR: > date/time field value out of range: "2015072913"' before the trigger even > fired. I wonder if that's deliberate? I was able to implement a workaround by > adding a raw_ts_fld column of type text, but an extra column might be too > ugly for you relative to a temp table, I don't know. I was thinking that perhaps an updatable view might do the trick? You would need to create a view with the timestamp column converted to text in the format in your CSV file. Next you add an INSERT rule that does the conversion from text to timestamp and inserts the row in the actual table. Finally, you use the view in the COPY statement instead of the table. Added bonus, you can now also use the view to export your table to the same CSV format. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] database-level lockdown
On 7 July 2015 at 12:55, Filipe Pina wrote: > On Ter, Jul 7, 2015 at 1:41 , Adrian Klaver > wrote: >> Still not sure what is you are trying to accomplish. Is it really necessary >> that every transaction be serialized? Or to put it another way, why are you >> running in serializable by default? Or yet another way, what is the problem >> you are trying to solve with serialized transactions? > > Exactly, that's the twist I've decided after some tests yesterday (the "lock > all tables on last try" degraded performance in an obscene way): giving up > on serializable by default. > > I wanted to use serializable so developers wouldn't have to worry about > properly using locks, but implementing this "serialization failure" handler > is becoming a major headache... What Adrian was trying to get out of you is why you think you need those locks. You're working with an RDBMS, it does the job of keeping data integrity in a multi-user environment already. You can trust it do do that well. So unless you're doing something really special, you shouldn't need to lock anything. Among the special cases where you do need locks are things like needing a gapless sequence (invoice numbers come to mind); things like that. So once again, what do you need those locks for? P.S. Please don't top-post on this list. I had to edit your message to fix the flow of the conversation. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] use null or 0 in foreign key column, to mean "no value"?
> On 27 Jun 2015, at 5:59, Robert Nikander wrote: > In application code, prepared statements want to say: `select * from items > where color_id = ?` and that `?` might be a int or null, so that doesn’t > work. You could add another parameter to test which expression to 'activate' in the query, something like: select * from items where ('notnull' = ? and color_id = ?) or ('null' = ? and color_id is null); Of course, with those _positional_ query parameters that means you need to add the same value TWICE into the query. You wouldn't need to with _named_ query parameters, if those are available to you. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Python 3.2 XP64 and Numpy...
> On 28 May 2015, at 17:54, Rémi Cura wrote: > > I tried: > > C:\Python32>python.exe > > Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on > win32 Ehm, this seems significant? ---^ It looks like you're trying to use 64-bit binaries on a 32-bit OS. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] date with month and year
On 21 May 2015 at 23:42, Karsten Hilbert wrote: > You are right in the following aspect: > > - client sends in "NOW at HERE" > - server knows HERE = UTC+2 And then the tectonic plate you're on shifts and you're suddenly in UTC+1 or +3 Thankfully, those things don't shift as fast as they sometimes do in the movies. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for?
On 22 May 2015 at 04:46, Bill Moran wrote: > I did a litle research and it appears that neither Oracle nor db2 supports > the 0xff syntax ... so not _quite_ as common as it seemed to me. > With all that being said, if I were to build a patch, would it be likely > to be accepted into core? Wouldn't you also need to support similar syntax for octal numbers for the patch to be complete? Or are those already supported (ISTR that's '077' for decimal 63)? Not that I care at all about octal numbers, but supporting one and not the other just doesn't seem right. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plpgsql functions organisation
o with C aren't justified, because > plpgsql has none of that complexity. There are advantages both ways. > > The beauty of PostgreSQL is that you have both available and you > can choose whichever is best for your situation. > > Agreed, though in my case I drop into plpythonu when I want more complex > solutions. > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > > > > -- > Melvin Davidson > I reserve the right to fantasize. Whether or not you > wish to share my fantasy is entirely up to you. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] schema or database
> On 13 Apr 2015, at 4:20, Ian Barwick wrote: > > On 13/04/15 11:08, Michael Cheung wrote: >> hi, all; >> >> I am new here. And I need some suggestion. >> >> I have many similar database to store data for every customer. >> Structure of database is almost the same. >> As I use same application to control all these data, so I can only use >> one database user to connect to these database. >> And I have no needs to query table for different customer together. >> >> I wonder which I should use, different shema or different database to store >> data? >> >> I 'd like to know the advantage and disadvantage for using schema or >> database. > > If as you say access to the database is via a single application database > user, it will probably make more sense to use multiple schemas rather than > multiple databases. Keeping everything in one database will simplify > administration (e.g. making backups - ypu'll just need to dump the one > database > rather than looping through a variable number) and will make life easier if > you > ever need to do some kind of query involving multiple customers. That's easier to backup, sure, but you can't restore a single customer's schema easily that way. So if one customer messes up their data big time, you'll need to restore a backup for all customers in the DB. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why doesn't `RAISE EXCEPTION` provide error context?
On 2 April 2015 at 19:15, Taylor Brown wrote: > So, I would rather put a check like this at the top of my function: > > -- > important_variable = (p_request::json->>'important_variable')::integer; > IF (important_variable is NULL) THEN RAISE EXCEPTION 'important_variable > must not be NULL.'; END IF; > -- > > But I won't be able to get the context for that exception, and all I'll be > able to return from the function or write to the logs is > 'important_variable must not be NULL.'. If that's the only place I throw > that error, I'll know where to look. Otherwise, I have no context, and > won't be able to determine where my exception was thrown. So I'm actually > better off _not_ throwing my own custom exceptions, even though I would > prefer to be more defensive about this sort of thing in my code. > > I thought I might be able to "trick" postgres into throwing another > unrelated exception that would not only include my custom error message, > but allow me to extract the context, telling me the function where I should > begin debugging. > Just a quick idea, but... Perhaps you could put a constraint on those JSON messages indirectly, by mapping it to some kind of template table? The top-level of a JSON message can usually be mapped to a table, and a table can have constraints and, for example, a BEFORE INSERT trigger that always returns NULL so that the data is not actually inserted into the table. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: [GENERAL] pgadmin3 installation on Oracle Linux 6.6 64-bit ?
On 28 March 2015 at 02:14, Yuri Budilov wrote: > I am new to PostgreSQL and Linux (coming across from Microsoft SQL Server). > > I installed PostgreSQL 9.4 on Oracle Linux 6.6 and its working ok (psql, > etc). > Now I try to install pgadmin3 on the same OS. > > I am having problems installing pdadmin3 rpms on Oracle Linux 6.6 64-bit. I'm often somewhat surprised that people expect to have GUI tools on database servers. Why would you really? You can connect from anywhere using your OS of choice (or rather, the choice of your employer in most cases) - it's perfectly fine to run pgadmin3 on Windows to connect to a PostgreSQL database on Oracle Linux. And why waste potential DB cycles on a graphical environment? You don't need it. For most things it's plenty (and frequently a lot more powerful and reliable) to have a textual user interface through a remote terminal session over SSH or equivalent. Of course, there is a learning curve to be able to use the UNIX (or Linux) shell and its tools - start with 'man man' if you're new ;) That said, of course I agree that it should be possible to install pgadmin3 on Oracle Linux if you'd want to. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Populating missing dates in postgresql data
5 2014-12-01 > Yes > > 286 5 2014-12-02 > No > > 4507 1 2014-12-01 > No > > 4507 1 2014-12-02 > No > > 4507 2 2014-12-01 > No > > 4507 2 2014-12-02 > No > > > > > > I have been racking my mind for the better part of a day on how to do this. > The thing is that I can do a right join of TABLE B on TABLE A, and this will > populate the missing object ids and channels. However, this only works for a > single > > > > object_id channelcheck-in date > comm failure > > ** > > 990 1 2014-12-01 > No > > 990 1 2014-12-02 > No > > 990 2 2014-12-01 > No > > 990 2 2014-12-02 > No > > 286 2 > Yes > > 286 2 2014-12-02 > No > > 286 5 > Yes > > 286 5 2014-12-02 > No > > 4507 1 2014-12-01 > No > > 4507 1 2014-12-02 > No > > 4507 2 2014-12-01 > No > > 4507 2 2014-12-02 > No > > > > I need to do a count of comm failures by day, so I need to populate the > check-in date field. Please help! Easiest would be to insert the missing values in your table, something like: WITH RECURSIVE calendar (missing_date) AS ( SELECT MAX(check_in_date) FROM table_a WHERE comm_failure = 'YES' UNION ALL SELECT missing_date + interval '1 day' FROM calendar WHERE missing_date < CURRENT_DATE ) INSERT INTO table_a (object_id, channel, check_in_date, comm_failure) SELECT b.object_id, b.channel. c.missing_date, 'YES' FROM table_b b, calendar c WHERE NOT EXISTS ( SELECT 1 FROM table_a a WHERE a.object_id = b.object_id AND a.channel = b.channel AND a.check_in_date = c.missing_date ); That's off the top of my head, untested, etc, but I think I got that mostly right. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Installation Size showing huge size in postgres installed on windows os
On 19 March 2015 at 13:44, Raymond O'Donnell wrote: > On 19/03/2015 12:39, jaime soler wrote: >> El mié, 18-03-2015 a las 23:05 -0700, Rajagopal NS escribió: >>> I have installed Postgres 9.0 in my machine. When I look at Programs and >>> Features under Control Panel, >>> I see the Size for Postgres 9.0 is shown as 121GB. >>> >>> I feel neither the installation or the small postgres databases I would have >>> created use 121GB. For what it's worth, I've seen the wildest claims about installation sizes in that panel. Mostly the installation size is severely underestimated, but overestimates happen too - haven't seen anything this bad though. I stopped trusting those numbers quite a while ago. The reliable method is to check the directory size of your catalog directories and the Postgres installation directory yourself. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Reg: PL/pgSQL commit and rollback
On 17 March 2015 at 15:30, Medhavi Mahansaria wrote: > Yes. I have read this document. > > But my issue is that even when it throws and exception I need to rollback > the changes made by that query and move on to the next block. > > Is there any way to accomplish that? Please do not toppost on this list. The solution is to do what Albe explained: use BEGIN ... EXCEPTION ... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Update using non-existent fields does not throw an error
On 16 March 2015 at 17:02, Rob Richardson wrote: > Greetings! > > > > An update query is apparently succeeding, even though the query refers to > fields that do not exist. Here’s the query: > > > > update inventory set > > x_coordinate = (select x_coordinate from bases where base = '101'), > > y_coordinate = (select y_coordinate from bases where base = '101') > > where charge = 100 > > > > -- select x_coordinate, y_coordinate from bases where base = '101' > > > > When I run the update query, it tells me that the query succeeded and that > four records were updated, which is what I expect. But when I looked at the > inventory table, I found that the four records were unchanged. So, I tried > to check the values of the base coordinates by running the select statement > shown above. That statement threw an error complaining that x_coordinate > and y_coordinate did not exist. This is correct; I should have been > querying a view that includes those fields. But why didn’t the update > statement throw an error? Because inventory contains those fields. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres not using GiST index in a lateral join
> On 04 Mar 2015, at 22:18, Igor Stassiy wrote: > > I would like to stop executing the query for a row of table "a" when a single > row of "b" is found. This query would not stop > processing but will filter all the rows that are found at the end of > execution. > > Is there a way to express this without a subquery? Does it? Because that would be somewhat surprising. > On Wed, Mar 4, 2015 at 11:53 PM Paul Ramsey wrote: > Stop writing so many subqueries, think in joins; the poor planner! > > SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id > FROM a > JOIN b > ON ST_Contains(b.shape, a.shape) > WHERE b.kind != 1 > > Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result > set down to just one of the inputs. > > -- for each row in A, select exactly one row in B (if there is one) > > -- such that B contains geometry of A > > EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE > > ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS > > TMP; > > > > which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A", > > "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan > > Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship": > > "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost": > > 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan", > > "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00, > > "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape && > > A.shape) AND _st_contains(shape, A.shape))" } ] } ] } } How did your query plan end up in JSON notation? It's quite difficult to read like this. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general