from:"Alban Hertroys"

Re: [GENERAL] Using Variables in Queries

2017-10-19 Thread Alban Hertroys

On 19 October 2017 at 17:25, Scott Mead  wrote:
>
>
> On Thu, Oct 19, 2017 at 11:21 AM, Igal @ Lucee.org  wrote:
>>
>> Hello,
>>
>> In other database servers, which I'm finally dropping in favor of
>> Postgres, I can do the following (mind you that this is for illustration
>> only, I do not actually write queries like that):
>>
>> DECLARE @query varchar(64) = 'red widget';
>>
>> SELECT *
>> FROM products
>> WHERE col1 LIKE @query
>>OR col2 LIKE @query
>>OR col3 LIKE @query
>>OR col4 LIKE @query
>>OR col5 LIKE @query
>>
>> The point is, though, that I can change the @query variable in one place
>> which is very convenient.
>>
>> Is it still true (the posts I see on this subject are quite old) that I
>> can not do so in Postgres outside of a stored procedure/function?

You should be able to do that using the DO statement:
https://www.postgresql.org/docs/9.6/static/sql-do.html

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] EAV Designs for Multi-Tenant Applications

2017-10-15 Thread Alban Hertroys

ear success (and 
> failure) stories from the community on how they've dealt with these.

I'm not that familiar with JSONB, but wouldn't it be possible to create an 
index over an organisation-specific JSON 'blob' such that all fields in it are 
part of the index? I expect that index types aimed at text searches (word 
searches) would be useful in that respect.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] a JOIN to a VIEW seems slow

2017-10-04 Thread Alban Hertroys

> On 2 Oct 2017, at 8:32, Frank Millman  wrote:
> 
>  
> From: Frank Millman
> Sent: Friday, September 22, 2017 7:34 AM
> To: pgsql-general@postgresql.org
> Subject: Re: a JOIN to a VIEW seems slow
>  
>  
> On Fri, Sep 22, 2017 at 7:34 AM, Frank Millman wrote:
> >  
> >   On Thu, Sep 21, 2017 at 3:54 PM, Merlin Moncure wrote:
> >   > Something is not adding up here.  Can you EXPLAIN ANALYZE the 26 minute
> >   > query?
> >   
>   >   > Here it is -
> >
>   > https://explain.depesz.com/s/cwm
> >
>  
> Just checking – is this under investigation, or is this thread considered 
> closed?
>  
> Frank

There are a few problems keeping track of this issue. First of all, above plan 
does not include the query (I don't know whether that's a thing with depesz's 
plan analyzer, but ISTR seeing plans _with_ their queries in other cases). That 
means we have to track back through the thread (quite a bit) to find a query 
that _may_ be the one that the plan is for. Add that to the fact that most of 
us are busy people, so we have to invest too much time into your problem to be 
able to help - and hence we don't.

The second problem is that the query plan highlights a couple of slow 
sequential scans on ar_tran_inv and ar_tran_rec, but those tables are not in 
your query from Sep 21st. That makes it impossible for anyone to pinpoint the 
problem for you. They're probably in your views somewhere, but we can't see 
where.

Looking at that query though, it seems to me that it could help to aggregate 
the results on cust_row_id in the inner query (aliased as q) to reduce the 
result set that the outer query needs to handle. It's possible that the query 
planner is smart enough to detect this, I can't tell from the plan, but I 
wouldn't bet on it. The query plan for that inner query should be interesting 
as well, especially if moving the aggregation inside does not help.

Another possible optimisation would be to add a lower bound for tran_date, if 
such is possible for your case. Currently you only have an upper bound: 
tran_date <= '2015-09-30'.
Even if there is no data from, say, before 2015-01-01, the query planner does 
not necessarily know that and may assume that most rows in the table/view will 
match the upper-bound condition - in which case a sequential scan is probably 
fastest. That may be why you don't see Postgres use the index on those columns 
you mentioned at some point.

Now, apparently you have an index on columns tran_type and tran_row_id, while 
tran_row_id is the primary key? In that case I'd suggest you drop that index: 
Any value of tran_row_id will have a single value of tran_type and you're 
joining on the PK already. Meanwhile, the condition on tran_type in the query 
only serves to complicate the query.

Oh, BTW, those LEFT JOINs will behave as INNER JOINs, because you use columns 
from the right-hand side (alloc_trans.tran_date and due_trans.tran_date 
respectively) in your WHERE clauses without allowing them to be NULL. If you 
want those parts to behave like proper LEFT JOINs, either add OR xxx.tran_date 
IS NULL or move those expressions into the JOIN conditions.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Insert large number of records

2017-09-22 Thread Alban Hertroys

On 20 September 2017 at 22:55, Job  wrote:
> One further question: within a query launched on the MASTER table where i 
> need to scan every table, for exaple to search rows locatd in more partitions.
> In there a way to improve "parallel scans" between more table at the same 
> time or not?
> I noticed, with explain analyze, the scan in the master table is Always 
> sequential, descending into the partitions.

Since nobody has replied to your latest question yet, I'll give it a try.

Which tables a query on your MASTER table needs to scan largely
depends on a PG feature called "constraint exclusion". That is to say,
if the query optimizer can deduce from your query that it only needs
to scan certain partitions for the required results, then it will do
so.

Now, whether the optimizer can do that, depends on whether your query
conditions contain the same (or equivalent) expressions on the same
fields of the same types as your partitioning constraints.

That 'same type' part is one that people easily miss. Sometimes part
of an expression gets auto-cast to make it compatible with the
remainder of the expression, but that is sometimes not the same type
as what is used in your partitioning (exclusion) constraint. In such
cases the planner often doesn't see the similarity between the two
expressions and ends up scanning the entire set of partitions.

See also section 5.10.4 in
https://www.postgresql.org/docs/current/static/ddl-partitioning.html ,
although it doesn't go into details of how to construct your select
statements to prevent scanning the entire partition set.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Insert large number of records

2017-09-20 Thread Alban Hertroys

On 20 September 2017 at 07:42, Job  wrote:
> We use a "temporary" table, populated by pg_bulkload - it takes few minutes 
> in this first step.
> Then, from the temporary table, datas are transferred by a trigger that copy 
> the record into the production table.
> But *this step* takes really lots of time (sometimes also few hours).
> There are about 10 millions of record.

Perhaps the problem isn't entirely on the writing end of the process.

How often does this trigger fire? Once per row inserted into the
"temporary" table, once per statement or only after the bulkload has
finished?

Do you have appropriate indices on the temporary table to guarantee
quick lookup of the records that need to be copied to the target
table(s)?

> We cannot use pg_bulkload to load directly data into production table since 
> pg_bulkload would lock the Whole table, and "COPY" command is slow and would 
> not care about table partitioning (COPY command fire partitioned-table 
> triggers).

As David already said, inserting directly into the appropriate
partition is certainly going to be faster. It removes a check on your
partitioning conditions from the query execution plan; if you have
many partitions, that adds up, because the database needs to check
that condition among all your partitions for every row.

Come to think of it, I was assuming that the DB would stop checking
other partitions once it found a suitable candidate, but now I'm not
so sure it would. There may be good reasons not to stop, for example
if we can partition further into sub-partitions. Anybody?

Since you're already using a trigger, it would probably be more
efficient to query your "temporary" table for batches belonging to the
same partition and insert those into the partition directly, one
partition at a time.

Even better would be if your bulkload could already be organised such
that all the data in the "temporary" table can indiscriminately be
inserted into the same target partition. That though depends a bit on
your setup - at some point the time saved at one end gets consumed on
the other or it takes even longer there.

Well, I think I've thrown enough ideas around for now ;)

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Insert large number of records

2017-09-19 Thread Alban Hertroys

> On 19 Sep 2017, at 15:47, Job  wrote:
> 
> Hi guys,
> 
> we need to insert from a table to another (Postgresql 9.6.1) a large amount 
> of data (about 10/20 millions of rows) without locking destination table.
> Pg_bulkload is the fastest way but it locks the table.
> 
> Are there other ways?
> Classic "COPY" from?

We do something like that using a staging table to load to initially (although 
not bulk; data arrives in our staging table with batches of 5k to 100k rows) 
and then we transfer the data using insert/select and "on conflict do".

That data-transfer within PG takes a couple of minutes on our rather limited VM 
for a wide 37M rows table (~37GB on disk). That only locks the staging table 
(during initial bulkload) and the rows in the master table that are currently 
being altered (during the insert/select).

If your data-source is a file in a format supported by COPY, then you can use 
COPY to do the initial bulk load into the staging table.

Some benefits of this 2-stage approach are that it leaves room to manipulate 
the data (type conversions, for example) and that it can handle the scenario 
where a matching target record in the master table already exists. In our case, 
we convert character fields to varchar (which saves a lot of space(s)).

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance with high correlation in group by on PK

2017-09-15 Thread Alban Hertroys

On 8 September 2017 at 00:23, Jeff Janes  wrote:
> On Tue, Aug 29, 2017 at 1:20 AM, Alban Hertroys  wrote:
>>
>> On 28 August 2017 at 21:32, Jeff Janes  wrote:
>> > On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys 
>> > wrote:
>> >>
>> >> Hi all,
>> >>
>> >> It's been a while since I actually got to use PG for anything serious,
>> >> but we're finally doing some experimentation @work now to see if it is
>> >> suitable for our datawarehouse. So far it's been doing well, but there
>> >> is a particular type of query I run into that I expect we will
>> >> frequently use and that's choosing a sequential scan - and I can't
>> >> fathom why.
>> >>
>> >> This is on:
>> >>
>> >>
>> >> The query in question is:
>> >> select "VBAK_MANDT", max("VBAK_VBELN")
>> >>   from staging.etl1_vbak
>> >>  group by "VBAK_MANDT";
>> >>
>> >> This is the header-table for another detail table, and in this case
>> >> we're already seeing a seqscan. The thing is, there are 15M rows in
>> >> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
>> >> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
>> >> we only have 1 at the moment!).
>> >
>> >
>> > You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
>> > currently detect and implement them automatically, but you can use a
>> > recursive CTE to get it to work.  There are some examples at
>> > https://wiki.postgresql.org/wiki/Loose_indexscan
>>
>> Thanks Jeff, that's an interesting approach. It looks very similar to
>> correlated subqueries.
>>
>> Unfortunately, it doesn't seem to help with my issue. The CTE is
>> indeed fast, but when querying the results from the 2nd level ov the
>> PK with the CTE results, I'm back at a seqscan on pdw2_vbak again.
>
>
> Something like this works:
>
> create table foo as select trunc(random()*5) as col1, random() as col2 from
> generate_series(1,1);
> create index on foo (col1, col2);
> vacuum analyze foo;
>
>
> with recursive t as (
>select * from (select col1, col2 from foo order by col1 desc, col2 desc
> limit 1) asdfsaf
> union all
>   select
>  (select col1 from foo where foo.col1 < t.col1 order by col1 desc, col2
> desc limit 1) as col1,
>  (select col2 from foo where foo.col1 < t.col1 order by col1 desc, col2
> desc limit 1) as col2
>from t where t.col1 is not null
> )
> select * from t where t is not null;
>
> It is pretty ugly that you need one subquery in the select list for each
> column to be returned.  Maybe someone can find a way to avoid that part.  I
> tried using lateral joins to get around it, but couldn't make that work.
>
> Cheers,
>
> Jeff

Thanks Jeff. That does indeed look ugly.

Since we're dealing with a 4GL language (FOCUS) that translates to
SQL, I don't think we'll attempt your workaround, even though we can
use SQL directly in our reports if we want to.

But, I just remembered giving someone else in a similar situation some
advice on this very list; Obviously, when my first primary key field
is not very selective, I should change the order of the fields in the
PK!

But let's first enjoy the weekend.

Alban.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] looking for a globally unique row ID

2017-09-15 Thread Alban Hertroys

On 15 September 2017 at 11:03, Rafal Pietrak  wrote:

>> Isn't this typically handled with an inheritance (parent-children)
>> setup.  MasterDocument has id, subtype and any common columns (create
>> date etc) then dependents use the same id from master to complete the
>> data for a given type.  This is really common in ORM tools.  Not clear
>> from the description if the operations could be similarly handled
>> (operation id, operation type as master of 17 dependent
>> operationSpecifics; there is also the "Activity Model")
>
> I do that, but may be I do that badly.
>
> Currently I do have 6 levels of inheritance which partition my
> document-class space. But I cannot see any way to have a unique index
> (unique constraint) to cover all those partitions at once.
>
> This is actually the core of my question: How to make one?
>
> So far I only have separate unique indexes on all those 12 child-table
> document-class subtables. Is there a way to combine those indexes? I
> experimented, and an index created on parent table does not cover
> content of child/inheriting tables. If it was, that would solve the problem.
>
>  or I've just missinterpreted you MasterDocument suggestion?

With table partitioning, provided the partitions are based on the
value(s) of a particular field that is part of the primary key of the
master table, the combination of the child tables' primary key and the
partition's check constraint on the partitioning field guarantee that
records across the partitioned tables are unique.

For example, if we define:
create table master_table (
year int
,   month int
,   example text
,   primary key (year, month)
);

create child2016_table () inherits master_table;

alter table child_table add constraint child2016_year_chk check (year = 2016);
alter table child_table add constraint child2016_pk primary key (year, month);

create child2017_table () inherits master_table;

alter table child_table add constraint child2017_year_chk check (year = 2017);
alter table child_table add constraint child2017_pk primary key (year, month);

In above, the three separate primary keys are guaranteed to contain
distinct ranges of year - provided that we forbid any records to go
directly into the master table or that those records do not have years
already covered by one of the child tables.

Perhaps you can apply this concept to your problem?

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance with high correlation in group by on PK

2017-08-29 Thread Alban Hertroys

On 28 August 2017 at 21:32, Jeff Janes  wrote:
> On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys  wrote:
>>
>> Hi all,
>>
>> It's been a while since I actually got to use PG for anything serious,
>> but we're finally doing some experimentation @work now to see if it is
>> suitable for our datawarehouse. So far it's been doing well, but there
>> is a particular type of query I run into that I expect we will
>> frequently use and that's choosing a sequential scan - and I can't
>> fathom why.
>>
>> This is on:
>>
>>
>> The query in question is:
>> select "VBAK_MANDT", max("VBAK_VBELN")
>>   from staging.etl1_vbak
>>  group by "VBAK_MANDT";
>>
>> This is the header-table for another detail table, and in this case
>> we're already seeing a seqscan. The thing is, there are 15M rows in
>> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
>> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
>> we only have 1 at the moment!).
>
>
> You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
> currently detect and implement them automatically, but you can use a
> recursive CTE to get it to work.  There are some examples at
> https://wiki.postgresql.org/wiki/Loose_indexscan

Thanks Jeff, that's an interesting approach. It looks very similar to
correlated subqueries.

Unfortunately, it doesn't seem to help with my issue. The CTE is
indeed fast, but when querying the results from the 2nd level ov the
PK with the CTE results, I'm back at a seqscan on pdw2_vbak again.

Just the CTE plan is in skipScan-top.sql.
The complete plan is in skipScan-full.sql

Note: I cloned the original etl1_vbak table into a new
pdw2_vbak table that has varchars instead of chars, which reduced
the table size to just over half the original's size. Hence the
different table names, but the behaviour for this particular issue is
the same between them.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

warehouse=# explain (analyze, buffers)
with recursive t as (
select min("VBAK_MANDT") as "VBAK_MANDT" from staging.pdw2_vbak
union all
select (select min("VBAK_MANDT") as "VBAK_MANDT" from staging.pdw2_vbak 
where "VBAK_MANDT" > t."VBAK_MANDT") from t where t."VBAK_MANDT" is not null
) select "VBAK_MANDT" from t;

   QUERY PLAN  
-
 CTE Scan on t  (cost=98.31..100.33 rows=101 width=32) (actual 
time=0.031..0.054 rows=2 loops=1)
   Buffers: shared hit=9
   CTE t
 ->  Recursive Union  (cost=0.73..98.31 rows=101 width=32) (actual 
time=0.029..0.052 rows=2 loops=1)
   Buffers: shared hit=9
   ->  Result  (cost=0.73..0.74 rows=1 width=32) (actual 
time=0.029..0.029 rows=1 loops=1)
 Buffers: shared hit=5
 InitPlan 3 (returns $1)
   ->  Limit  (cost=0.56..0.73 rows=1 width=32) (actual 
time=0.026..0.027 rows=1 loops=1)
 Buffers: shared hit=5
 ->  Index Only Scan using pdw2_vbak_pkey on 
pdw2_vbak pdw2_vbak_1  (cost=0.56..2375293.75 rows=14214332 width=32) 
(actual time=0.024..0.024 rows=1 loops=1)
   Index Cond: ("VBAK_MANDT" IS NOT NULL)
   Heap Fetches: 1
   Buffers: shared hit=5
   ->  WorkTable Scan on t t_1  (cost=0.00..9.56 rows=10 width=32) 
(actual time=0.009..0.010 rows=0 loops=2)
 Filter: ("VBAK_MANDT" IS NOT NULL)
 Rows Removed by Filter: 0
 Buffers: shared hit=4
 SubPlan 2
   ->  Result  (cost=0.93..0.94 rows=1 width=32) (actual 
time=0.015..0.015 rows=1 loops=1)
 Buffers: shared hit=4
 InitPlan 1 (returns $3)
   ->  Limit  (cost=0.56..0.93 rows=1 width=32) (actual 
time=0.013..0.013 rows=0 loops=1)
 Buffers: shared hit=4
 ->  Index Only Scan using pdw2_vbak_pkey 
on pdw2_vbak  (cost=0.56..1732075.91 rows=4738111 width=32) (actual 
time=0.012..0.012 rows=0 loops=1)
   Index Cond: (("VBAK_MANDT" IS NOT NUL

Re: [GENERAL] Performance with high correlation in group by on PK

2017-08-28 Thread Alban Hertroys

On 28 August 2017 at 14:22, Alban Hertroys  wrote:

> This is on:
Just noticed I forgot to paste this in:
warehouse=# select version();
 version
--
 PostgreSQL 9.6.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)


--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Performance with high correlation in group by on PK

2017-08-28 Thread Alban Hertroys

Hi all,

It's been a while since I actually got to use PG for anything serious,
but we're finally doing some experimentation @work now to see if it is
suitable for our datawarehouse. So far it's been doing well, but there
is a particular type of query I run into that I expect we will
frequently use and that's choosing a sequential scan - and I can't
fathom why.

This is on:


The query in question is:
select "VBAK_MANDT", max("VBAK_VBELN")
  from staging.etl1_vbak
 group by "VBAK_MANDT";

This is the header-table for another detail table, and in this case
we're already seeing a seqscan. The thing is, there are 15M rows in
the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
"VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
we only have 1 at the moment!).

Explain analyze says the following about this query:
warehouse=# explain (analyze, buffers) select "VBAK_MANDT",
max("VBAK_VBELN") from staging.etl1_vbak group by "VBAK_MANDT";
  QUERY PLAN
--
 HashAggregate  (cost=1990054.08..1990054.09 rows=1 width=36) (actual
time=38723.602..38723.602 rows=1 loops=1)
   Group Key: "VBAK_MANDT"
   Buffers: shared hit=367490 read=1409344
   ->  Seq Scan on etl1_vbak  (cost=0.00..1918980.72 rows=14214672
width=15) (actual time=8.886..31317.283 rows=14214672 loops=1)
 Buffers: shared hit=367490 read=1409344
 Planning time: 0.126 ms
 Execution time: 38723.674 ms
(7 rows)

As you can see, a sequential scan. The statistics seem quite correct.

If I change the query to select a single value of "VBAK_MANDT" we get:
warehouse=# explain (analyze, buffers) select max("VBAK_VBELN") from
staging.etl1_vbak where "VBAK_MANDT" = '300';

   QUERY PLAN
---
 Result  (cost=1.37..1.38 rows=1 width=32) (actual time=14.911..14.911
rows=1 loops=1)
   Buffers: shared hit=2 read=3
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.56..1.37 rows=1 width=11) (actual
time=14.907..14.908 rows=1 loops=1)
   Buffers: shared hit=2 read=3
   ->  Index Only Scan Backward using etl1_vbak_pkey on
etl1_vbak  (cost=0.56..11498362.31 rows=14214672 width=11) (actual
time=14.906..14.906 rows=1 loops=1)
 Index Cond: (("VBAK_MANDT" = '300'::bpchar) AND
("VBAK_VBELN" IS NOT NULL))
 Heap Fetches: 1
 Buffers: shared hit=2 read=3
 Planning time: 0.248 ms
 Execution time: 14.945 ms
(11 rows)

That is more in line with my expectations.

Oddly enough, adding "MANDT_VBAK" and the group by back into that last
query, the result is a seqscan again.

For "VBAK_MANDT" we see these statistics:
Null fraction: 0
Average width: 4
Distinct values: 1
Most common values: {300}
Most common frequencies: {1}
Histogram bounds :
Correlation :1

The table definition is:
  Table "staging.etl1_vbak"
   Column| Type  | Modifiers
-+---+---
 VBAK_MANDT  | character(3)  | not null
 VBAK_VBELN  | character(10) | not null
 VBAK_ERDAT  | date  | not null
 VBAK_ERZET  | character(6)  | not null
 VBAK_ERNAM  | character(12) | not null
 VBAK_ANGDT  | date  | not null
 VBAK_BNDDT  | date  | not null
 VBAK_AUDAT  | date  | not null
...
VBAK_MULTI  | character(1)  | not null
 VBAK_SPPAYM | character(2)  | not null
Indexes:
"etl1_vbak_pkey" PRIMARY KEY, btree ("VBAK_MANDT", "VBAK_VBELN")
"idx_etl1_vbak_erdat" btree ("VBAK_ERDAT")

A final remark: The table definition was generated by our
reporting/ETL software, hence the case-sensitive column names and the
use of the character type instead of varchar (or text).

What can be done to help the planner choose a smarter plan?
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] cast issue in WITH RECURION

2017-08-03 Thread Alban Hertroys


> On 3 Aug 2017, at 20:22, k b  wrote:
> 
> when i create a recursive query and try to add the distances i get a message:
> ERROR:  recursive query "edges" column 3 has type numeric(7,3) in 
> non-recursive term but type numeric overall.

> My exercise is almost identical to the example in the docs:
> WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
>SELECT g.id, g.link, g.data, 1,
>  ARRAY[g.id],
>  false
>FROM graph g
>  UNION ALL
>SELECT g.id, g.link, 
>sg.data + g.data, -- altered section, data is numeric(7,3)
>sg.depth + 1,
>  path || g.id,
>  g.id = ANY(path)
>FROM graph g, search_graph sg
>WHERE g.id = sg.link AND NOT cycle
> )
> SELECT * FROM search_graph;

I believe the solution is rather simple; just cast(sg.data + g.data to 
numeric(7,3))

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Dealing with ordered hierarchies

2017-07-24 Thread Alban Hertroys

> On 24 Jul 2017, at 9:02, Tim Uckun  wrote:
> 
> I have read many articles about dealing with hierarchies in postgres 
> including nested sets, ltree, materialized paths, using arrays as parentage,  
> CTEs etc but nobody talks about the following scenario.
> 
> Say I have a hierarchy like this
> 
> 1
> 1.1
> 1.1.1
> 1.1.2
> 1.2
> 1.3
> 2
> 2.1
> 
> In this hierarchy the order is very important and I want to run frequent(ish) 
> re-ordering of both subsets and entire trees and even more frequent inserts.

Since they're hierarchies, the order is already in the structure of the data. 
Do you really need to add it to the data or would it suffice to add it to the 
query result?

If that's the case, you only need a simple ordering number per branch, like 1, 
2, 3, etc. The full path (ie. '1.1.3') gets generated in the query.

I regularly generate structures like your above example using recursive CTE's. 
The "path" helps to get the results in the correct order for starters (although 
you're in for a surprise if any of your levels go past 9 in the above). It's 
great how you can "trickle" all kinds of calculations through the hierarchy 
using CTE's.

Something like this should help to get you started (untested, I usually do this 
in Oracle, which has several peculiarities):

with recursive hierarchy (parent, node, sequence_number, path) as (
select null, node, sequence_number, sequence_number::text from table
union all
select h.node, t.node, t.sequence_number, h.path || '.' || 
t.sequence_number::text
  from table t
  join hierarchy h on (t.parent = h.node)
)
select node, path
  from hierarchy

Where the table "table" has fields:
parent  -- parent node
node-- actual node
sequence_number -- Order of sequence of this node within its parent 
branch

You may need to add a surrogate key if your parent/child combinations are 
otherwise not unique. That would then also be the way to address a node 
directly (otherwise it would be (parent, node)).

For the sequence_number I'd probably just use an actual sequence generator with 
a large enough gap to prevent problems with reordering items later on 
(increment by 10 for example). You will also want to pad the sequence numbers 
in the "path" column with leading zeroes (otherwise 10 sorts between 1 and 2, 
etc.), enough that you won't run out of numbers per level.

If you require your sequence numbers to be subsequent in the result: You can 
add a field with such numbering based on the existing sequence_numbers, by 
using a windowing function in each branch of the union - it's down to a fairly 
basic row numbering problem at this point.

> Scenario 1: I want to insert a child into the 1.1 subtree.  The next item 
> should be 1.1.3 and I can't figure out any other way to do this other than to 
> subquery the children and to figure out the max child ID, add one to it which 
> is a race condition waiting to happen.

You would first need to determine which node is the parent node by traversing 
the hierarchy up to the point of insertion and use the (parent, node) or 
surrogate key fields to append under. Similar to using '1.1', really.

> Scenario 2: I now decide the recently inserted item is the second most 
> important so I reset the ID to 1.1.2 and then increment 1.1.2 (and possibly 
> everything below).  Again this is both prone to race conditions and involves 
> a heavy update.

No need to bother with that (much) with the above approach. And if you do run 
out of gaps, you can fairly simply update all the sequence numbers under the 
same parent without causing concurrency issues and without requiring 
locks/synchronisation.

> Is there a better way to deal with this or is the complexity unavoidable?

I think it's better, but I don't think its ideal. It's fairly complicated to 
understand, for one thing, which can cause problems for maintenance (I have 
colleagues who don't dare to touch my queries, for example).

> I should state that like most database reads will be much more frequent than 
> writes and inserts will be more frequent than updates (re-ordering)

More of the logic (and thus system load) gets moved to the read-side of things, 
that's probably a drawback, but most of it is just keeping state and counting. 
I don't expect that to be all that much.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Schedule

2017-06-20 Thread Alban Hertroys

> On 20 Jun 2017, at 18:46, Adrian Klaver  wrote:
> 
> On 06/20/2017 08:12 AM, Steve Clark wrote:
>> On 06/20/2017 10:38 AM, Adrian Klaver wrote:
>>> On 06/20/2017 07:00 AM, Steve Clark wrote:
> 
>> We already have a monitoring system in place that has been in operation 
>> circa 2003. Just recently we have
>> added a new class of customer whose operation is not 24/7.
>> I envision the schedule could be fairly complicated
>> including WE and holidays, plus the enduser might shut down for lunch etc. I 
>> am looking for more on how to organize the
>> schedule, EG a standard weekly schedule then exceptions for holidays etc, or 
>> a separate individual schedule for
>> each week, also need to consider how easy it is to maintain the schedule, 
>> etc.
> 
> Yes this could be become complicated if for no other reason then it is being 
> driven from the customer end and there will need to be a process to verify 
> and incorporate their changes.

There you're saying something rather important: "If it is being driven from the 
customer end".

> 2) Figure out what a day is. In other words are different timezones involved 
> and if so what do you 'anchor' a day to?

For an example of how that might fail: At our company, they work in shifts (I 
don't) of 3*8 hours, that run from 23:00 to 23:00. Depending on who looks at 
the data, either that's a day or a normal day (00:00-00:00) is. It's a matter 
of perspective.

IMHO, the only safe approach is to have the customer end decide whether it's a 
regular outage or an irregular one. There is just no way to reliably guess that 
from the data. If a customer decides to turn off the system when he's going 
home, you can't guess when he's going to do that and you will be raising false 
positives when you depend on a schedule of when he might be going home.

>From a software implementation point of view that means that your 
>customer-side application needs to be able to signal planned shutdowns and 
>startups. If you detect any outages without such a signal, then you can flag 
>it as a problem.

There are still opportunities for getting those wrong of course, such as lack 
of connectivity between you and your customer, but those should be easy to 
explain once detected.
And I'm sure there are plenty of other corner-cases you need to take into 
account. I bet it has a lot of problems in common with replication actually 
(how do we reliably get information from system A to system B), so it probably 
pays to look at what particular problems occur there and how they're solved.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Create extension C using IDE

2017-06-12 Thread Alban Hertroys


> On 12 Jun 2017, at 20:09, Adrian Klaver  wrote:
> 
> On 06/12/2017 11:03 AM, Fabiana Zioti wrote:
>> I already have postgresql 9.6 installed. I'm developing extensions for it in 
>> C. What would be the best IDE to debug the code? Eclipse has many errors.
> 
> The errors are?

My guess would be its user interface…

I would suggest gViM or MacViM, but that really depends on what you expect from 
a (I)DE and on what platform you are.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Inconsistent performance with LIKE and bind variable on long-lived connection

2017-06-10 Thread Alban Hertroys

> On 10 Jun 2017, at 5:37, Steven Grimm  wrote:

[…]

I notice that you're declaring your ResultSet variable inside the loop, which 
means that you create and destroy it frequently. I've been told that this is a 
pattern that the GC has trouble keeping up with (although that was around the 
Java 5 era), so you might be seeing the effects of memory churn in your client 
instead of in the database.

I modified your function to not do that anymore, does that make a difference?

Note; the long variables are scalar instead of objects. I don't think they need 
the same treatment, but it can't hurt.

>   private static void logTime(String name, PreparedStatement stmt) throws 
> SQLException { 
> StringBuilder out = new StringBuilder(String.format("%-22s", name)); 
> ResultSet rs;
> long startTime, endTime;
> 
> for (int i = 0; i<  20; i++) { 
>   startTime = System.currentTimeMillis(); 
>   rs = stmt.executeQuery(); 
>   while (rs.next()) { 
> rs.getString(1); 
>   } 
>   endTime = System.currentTimeMillis(); 
>   rs.close(); 
> 
>   out.append(String.format(" %3d", endTime - startTime)); 
> } 
> 
> stmt.close(); 
> 
> System.out.println(out); 
>   } 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Weirdness with "not in" query

2017-06-08 Thread Alban Hertroys

On 8 June 2017 at 17:27, greigwise  wrote:
> So, I'm using postgres version 9.6.3 on a mac and the results to this series
> of queries seems very strange to me:
>
> db# select count(*) from table1 where id in
> (1706302,1772130,1745499,1704077);
>  count
> ---
>  4
> (1 row)
>
> db# select count(*) from table2 where table1_id in
> (1706302,1772130,1745499,1704077);
>  count
> ---
>  0
> (1 row)
>
> db# select count(*) from table1 where id not in (select table1_id from
> table2);
>  count
> ---
>  0
> (1 row)
>
> I would expect the "not in" query to return a result of at least 4.  Am I
> totally misunderstanding how this should work (I really don't think so) or
> is something wrong?

You probably have table1_id's that are NULL in table2. In that case
the result of not in is null as well.

Not exists is perhaps a better candidate in this case.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Coditional join of query using PostgreSQL

2017-05-15 Thread Alban Hertroys


> On 15 May 2017, at 19:02, Nick Dro  wrote:
> 
> Hi, 
> I'm new to postgresql and couldn't find answer to this situation anywhere. 
> I asked this here: 
> http://stackoverflow.com/questions/43984208/coditional-join-of-query-using-postgresql
> 
> I hope there is a better solution rather than creating two separated 
> functions :(

You can use your boolean parameter inside the join condition:
[…] on (tfquery.a = main.a and ((type_f and tfquery.d = main.d) or not type_f))

Beware that you don't also have a column named type_f in that join somewhere.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Column information

2017-05-04 Thread Alban Hertroys


> According to the documentation 'numeric_precision_radix' field should
> indicate what radix the value of 'numeric_precision' is stored.
> 
> However, even though the radix is 2, the actual value is 32, which is
> not a radix 2.

https://en.wikipedia.org/wiki/Radix

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] UDP buffer drops / statistics collector

2017-04-19 Thread Alban Hertroys


> On 19 Apr 2017, at 20:36, Tim Kane  wrote:
> 
> Well, this is frustrating..
> The buffer drops are still occurring - so I thought it worth trying use a 
> ramdisk and set stats_temp_directory accordingly.
> 
> I've reloaded the instance, and can see that the stats directory is now being 
> populated in the new location.  Except - there is one last file 
> (pgss_query_texts.stat) that continues to be updated in the old pg_stat_tmp 
> path..  Is that supposed to happen?
> 
> 
> Fairly similar to this guy (but not quite the same).
> https://www.postgresql.org/message-id/d6e71befad7beb4fbcd8ae74fadb1265011bb40fc...@win-8-eml-ex1.eml.local
> 
> I can see the packets arriving and being consumed by the collector..  and, 
> the collector is indeed updating in the new stats_temp_directory.. just not 
> for that one file.
> 
> 
> It also failed to resolve the buffer drops.. At this point, I'm not sure I 
> expected it to.  They tend to occur semi-regularly (every 8-13 minutes) but I 
> can't correlate them with any kind of activity (and if I'm honest, it's 
> possibly starting to drive me a little bit mad).

This rings a bell for me. I recently had a similar issue in an MMO (Windows) 
where every 15 minutes I would get a number of consecutive freezes in-game. You 
could set your alarm by it, so regular.

That suddenly went away after I rearranged my home-network (for unrelated 
reasons), which incidentally moved several connections from the switch the 
game-system was connected to to another switch. I never pinpointed it to UDP, 
but then again, TCP would correct for the lost transfers (probably at the cost 
of UDP traffic).

Perhaps you have a switch somewhere that's overburdened?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Alban Hertroys


> On 19 Apr 2017, at 12:58, Samuel Williams  
> wrote:
> 
> It's interesting you talk about using multiple indexes. In
> MySQL/MariaDB and derivatives, I've never seen the query planner
> consider using multiple indexes. So, it's possible that Postgres may
> use multiple indexes if it saves time? Or do you mean, doing something
> like manually joining the data and leveraging the different indexes
> explicitly?

PG is capable of doing bitmap heap scans to combine results from multiple 
indices, among other things.
Whether that will actually improve performance in this case I don't know, but 
it's worth a try I think.

> The correlation between user_id and location... well, it's somewhat
> temporally related.

So users are constantly moving around but happen to be at the same locations at 
regular intervals?

In my experience, people don't usually move around much, so you should 
certainly be able to pinpoint them mostly to a specific area, right? (Hence my 
suggestions for a country column or partitioning in squares)


> On 19 April 2017 at 22:50, Alban Hertroys  wrote:
>> 
>>> On 19 Apr 2017, at 6:01, Samuel Williams  
>>> wrote:
>>> 
>>> Hi.
>>> 
>>> We have 400,000,000 records in a table (soon to be 800,000,000), here
>>> is the schema (\d+)
>>> 
>>> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121
>>> 
>>> We want the following kinds of query to be fast:
>>> 
>>> SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
>>> "user_event"."what" IN ('poll', 'location_change',
>>> 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19
>>> 01:23:55') AND (latitude > -37.03079375089291 AND latitude <
>>> -36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
>>> 175.0805140220076);
>>> 
>>> We have a btree index and it appears to be working. However, it's
>>> still pretty slow.
>>> 
>>> EXPLAIN ANALYZE gives the following:
>>> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314
>>> 
>>> I'm thinking that I need to do the following to help:
>>> 
>>> CLUSTER user_event ON index_user_event_for_visits_3 followed by
>>> analyze... Our data is mostly time series but sometimes we get some
>>> dumps with historical records.
>> 
>> It seems to me that a large part of the problem is that the server has to 
>> scan all index entries from that date and within those location bounds to 
>> find that the distinct user id's in that set are about 114,000 out of 1.7M 
>> rows matching the selection-criteria. If it could stop at the first location 
>> for each user, it would have to scan less than a 10th of the index entries 
>> that it has to scan now...
>> 
>> How high is the correlation between user id and location? That can probably 
>> be leveraged…
>> Perhaps you'd get better performance if you'd use multiple indices instead 
>> of stuffing everything into a single purpose-specific one? I would suggest 
>> one on (user_id, latitude, longitude) or (latitude, longitude, user_id) and 
>> one on (created_at, user_id), or perhaps (created_at, latitude, longitude). 
>> That approach could also reduce the number of indices you have on that 
>> table, as well as their sizes, making it all fit into cache a little easier. 
>> Then again, additional operations will be required to combine them.
>> 
>> For a different approach; It may be possible to enrich your data with 
>> something that is easy to index and query, with a high correlation to 
>> (latitude, longitude). That could also be used to partition over. Country 
>> seems a good candidate here, unless all your data-points are in New Zealand 
>> like the above?
>> Then again, some countries are a lot larger, with a higher population, than 
>> others. And populations can be highly concentrated (Delhi, Moscow to name a 
>> few).
>> Another option is to divide the location space up into squares of a fixed 
>> size, with a partition for each square. About 80% of those squares are 
>> unpopulated though, being at sea.
>> 
>> Created_at is a very good candidate for partitioning too, especially if you 
>> don't intend to keep data older than a certain age. Truncating or dropping a 
>> partition that you no longer need is quick and easy.
>> 
>> With data-sets this large, I'd think you would want to

Re: [GENERAL] Large data and slow queries

2017-04-19 Thread Alban Hertroys

> On 19 Apr 2017, at 6:01, Samuel Williams  
> wrote:
> 
> Hi.
> 
> We have 400,000,000 records in a table (soon to be 800,000,000), here
> is the schema (\d+)
> 
> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121
> 
> We want the following kinds of query to be fast:
> 
> SELECT DISTINCT "user_event"."user_id" FROM "user_event" WHERE
> "user_event"."what" IN ('poll', 'location_change',
> 'midnight_location') AND ("user_event"."created_at" >= '2016-04-19
> 01:23:55') AND (latitude > -37.03079375089291 AND latitude <
> -36.67086424910709 AND longitude > 174.6307139779924 AND longitude <
> 175.0805140220076);
> 
> We have a btree index and it appears to be working. However, it's
> still pretty slow.
> 
> EXPLAIN ANALYZE gives the following:
> https://gist.github.com/ioquatix/bddda36d9e4ffaceb7a62d7b62259121#gistcomment-2065314
> 
> I'm thinking that I need to do the following to help:
> 
> CLUSTER user_event ON index_user_event_for_visits_3 followed by
> analyze... Our data is mostly time series but sometimes we get some
> dumps with historical records.

It seems to me that a large part of the problem is that the server has to scan 
all index entries from that date and within those location bounds to find that 
the distinct user id's in that set are about 114,000 out of 1.7M rows matching 
the selection-criteria. If it could stop at the first location for each user, 
it would have to scan less than a 10th of the index entries that it has to scan 
now...

How high is the correlation between user id and location? That can probably be 
leveraged…
Perhaps you'd get better performance if you'd use multiple indices instead of 
stuffing everything into a single purpose-specific one? I would suggest one on 
(user_id, latitude, longitude) or (latitude, longitude, user_id) and one on 
(created_at, user_id), or perhaps (created_at, latitude, longitude). That 
approach could also reduce the number of indices you have on that table, as 
well as their sizes, making it all fit into cache a little easier. Then again, 
additional operations will be required to combine them.

For a different approach; It may be possible to enrich your data with something 
that is easy to index and query, with a high correlation to (latitude, 
longitude). That could also be used to partition over. Country seems a good 
candidate here, unless all your data-points are in New Zealand like the above?
Then again, some countries are a lot larger, with a higher population, than 
others. And populations can be highly concentrated (Delhi, Moscow to name a 
few).
Another option is to divide the location space up into squares of a fixed size, 
with a partition for each square. About 80% of those squares are unpopulated 
though, being at sea.

Created_at is a very good candidate for partitioning too, especially if you 
don't intend to keep data older than a certain age. Truncating or dropping a 
partition that you no longer need is quick and easy.

With data-sets this large, I'd think you would want to partition on multiple 
dimensions, creating a matrix of partitions under a single master table. I 
don't think PG has a simple way of doing that (yet) though; perhaps it's 
possible by abusing multiple levels of inheritance, but that sounds like a bad 
idea.

And of course, create your partitions sufficiently course to prevent 
overburdening the system tables, which would slow down the query planner.

Hopefully there's something useful in my ramblings!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] CANNOT USE ANY INDEX ON UPSERT (INSERT.....ON CONFLICT)

2017-04-18 Thread Alban Hertroys

> On 18 Apr 2017, at 10:13, agharta  wrote:
> 
> Hi all,
> 
> I have a problem with INSERT ... ON CONFLICT sql command.
> 
> Reading 9.6 documentation i see that ON  CONFLICT command will accpets only 
> index_column_name or index_expression (unique composite/primary indexes are 
> valid too).
> 
> So, my problem is that i can't create any type of upsert-valid index . Let me 
> explain.
> 
> I have a table T1 containing  F1, F2, F3, F4 fields.
> 
> I can insert same records in T1, MAX TWICE.

How is UPSERT supposed to know which of a pair of duplicate records it is 
supposed to update? You'll have to make them unique somehow. The safest 
approach is usually to add a surrogate key based on a sequence.

> I can have records like (A,B,C,D),(B,A,D,C), etc.. and  (A,B,C,D) AGAIN. Any 
> other next insert of (A,B,C,D) is not allowed (actually it is avoided by a 
> complex-and-slow-performance select count in before insert/update trigger).  

You're probably better off with an EXISTS query there. Something like:

select F1, F2, F3, F4,
case
when exists (select 1 from T1 t where t.F1 = T1.F1 and t.F2 = T1.F2 and 
t.F3 = T1.F3 and t.F4 = T1.F4 and t.pk <> T1.pk) then 1
else 0
end as have_duplicate
from T1
where F1 = NEW.F1 and F2 = NEW.F2 and F3 = NEW.F3 and F4 = NEW.F4
limit 1;

The pk field in there is the surrogate key from the previous paragraph.

Alternatively, wrap your COUNT around a sub-query that's limited to 2 results. 
No extra pk needed in that case, unless you still need to use UPSERT with that.

In either case it will make a big difference to have an index on at least (F1, 
F2, F3, F4), perhaps with the new pk column added at the end.

> In this case i can't create any type of primary/unique index, like a 
> composite F1,F2, F3, F4 index. (correct me if i am wrong please).

Correct, you'll most likely have to add a new one (unless someone comes up with 
better suggestions).

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [ADMIN] calculating table and index size

2017-04-07 Thread Alban Hertroys

On 7 April 2017 at 09:11, Günce Kaya  wrote:
> Hi again,
>
> Sorry for delay.
>
> Guillaume, I read your answer for first question but It's not clear to me.
> The table has a column and index also use that column. so in that example, I
> think table size and index size should be equal. Why these are not equal?

If I understand correctly, the table only has 1 (integer) column and
all those 1,400,000 rows have the same value?

Then the table has to store each row separately and thus has to store
the same value repeatedly. It also has to store some meta-data, such
as visibility information.

The index on the other hand (assuming a btree index) knows that there
is only a single value in the table and therefore only stores a single
value, but it has to reference each row in the table that contains
that value.

So the table and the index are storing different things, but the total
size of each row/index node for that single integer column is of the
same order of magnitude. That's why they are similar in size.

If you would add another integer column to your table and VACUUM FULL
the table, the table would be about double its size, but the index
would stay the same size.

Regards,
Alban.

> On Wed, Apr 5, 2017 at 1:02 PM, Steven Chang 
> wrote:
>>
>> Hello,
>>
>> try pgstattuple() and pgstatindex() , I think you will figure it out.
>>
>> Steven
>>
>> 2017-04-05 16:56 GMT+08:00 Guillaume Lelarge :
>>>
>>> Hi,
>>>
>>> 2017-04-05 9:44 GMT+02:00 Günce Kaya :

 Hi all,

 I have some questions about calculating table and index size.

 I have a dummy table which has an integer column and its index. The
 table has 140 rows and all of rows are same thats value is 2000.
 Table size is 50MB and index size is 31MB. Why there is too much size
 difference between table and its index? what happen on data files when we
 add index?

>>>
>>> You have metadata informations in the table datafiles that you don't have
>>> on the index datafiles. For example, all the system columns for each line.
>>>

 Second question is that after created table, table size was 0 byte. I
 inserted a row as 120 then table size was 8192 byte. I inserted five times
 same value to the table and table size is still 8192 bytes. Table size
 changed after inserted lots of rows. Table size was stabile till first few
 hundred rows. why table size didn't change when I inserted lots of rows?

>>>
>>> PostgreSQL works with 8KB blocks. When you insert a line, it puts it on a
>>> block, but this block may contain many lines. So your next new lines still
>>> fit in the first block... until it doesn't, and you'll see a new block
>>> coming, making your table datafile grows to 16KB. And so on and so on.
>>>
>>>
>>> --
>>> Guillaume.
>>>   http://blog.guillaume.lelarge.info
>>>   http://www.dalibo.com
>>
>>
>
>
>
> --
> Gunce Kaya
>
> Linkedin - Twitter - Blog

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Run statements before pg_dump in same transaction?

2017-03-23 Thread Alban Hertroys

> On 23 Mar 2017, at 20:23, John R Pierce  wrote:
> 
> On 3/23/2017 12:06 PM, François Beausoleil wrote:
>> For development purposes, we dump the production database to local. It's 
>> fine because the DB is small enough. The company's growing and we want to 
>> reduce risks. To that end, we'd like to anonymize the data before it leaves 
>> the database server.
> 
> how about piping the /output/ of pg_dump to GPG or a similar file/stream 
> encryption module ?

John, anonymising data is not about encrypting, but about making data 
anonymous. This is usually done to create data sets useful for development or 
statistical analysis, whilst preventing people from obtaining sensitive 
information from the data set. For example, companies want to know how many 
facebook users are interested in buying drones, but those companies should not 
know who exactly those people are.

ISTR that there are some tools for this purpose, but the details escape me.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to create unique index on multiple columns where the combination doesn't matter?

2017-03-23 Thread Alban Hertroys

> On 22 Mar 2017, at 17:54, Glen Huang  wrote:
> 
> Hello,
> 
> If I have a table like
> 
> CREATE TABLE relationship (
>  obj1 INTEGER NOT NULL REFERENCES object, 
>  obj2 INTEGER NOT NULL REFERENCES object,
>  obj3 INTEGER NOT NULL REFERENCES object,
>  ...
> )
> 
> And I want to constrain that if 1,2,3 is already in the table, rows like 
> 1,3,2 or 2,1,3 shouldn't be allowed.
> 
> Is there a general solution to this problem?

Does the order of the values of (obj1, obj2, obj3) in relationship matter? If 
not, you could swap them around on INSERT/UPDATE to be in sorted order. I'd 
probably go with a BEFORE INSERT OR UPDATE trigger.

In addition, to prevent unsorted entry, on obj2 add CHECK (obj2 > obj1) and on 
obj3 add CHECK (obj3 > obj2).

Now you can create a normal PK or unique key on (obj1, obj2, obj3) as the order 
of their values is not variable anymore.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] CHECK for 2 FKs to be non equal

2017-03-11 Thread Alban Hertroys


> On 11 Mar 2017, at 10:41, Alexander Farber  wrote:
> 
>  uid integer NOT NULL REFERENCES words_users(uid) CHECK (uid <> 
> author) ON DELETE CASCADE,

> but get syntax error in 9.5:
> 
> ERROR:  syntax error at or near "ON"
> LINE 2: ...REFERENCES words_users(uid) CHECK (uid <> author) ON DELETE …

You put your CHECK constraint definition smack in the middle of the FK 
constraint definition, which starts with REFERENCES and ends with the delete 
CASCADE.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Simple Query not using Primary Key Index

2017-02-06 Thread Alban Hertroys


> On 7 Feb 2017, at 0:16, Podrigal, Aron  wrote:
> 
> Hi,
> 
> I noticed when I do a simple SELECT id FROM mytable WHERE id = 
> 'cb81d070-4213-465f-b32e-b8db43b83a25'::UUID  Postgres does not use the 
> primary key index and opts for a Seq Scan.
> 
> I of course did VACUUM ANALYZE and I have reset statistics But no sign. Is 
> there any particular thing I should be looking at?

An EXPLAIN ANALYZE would be a good start.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] (solved) R: Very strange problem on index

2017-02-02 Thread Alban Hertroys

On 2 February 2017 at 14:57, Job  wrote:
> Hi Raymond,
>
> Thank nyou for your appreciated feedback.

But what's your answer to his question? You still didn't tell.

> Here is the original message:
>
> i really strange problem, quite near to paranormal, is occurring during a 
> server migration.
>
> We have a table with some millions of record, perfectly working on other 
> Postgresql 9.6.1 machines:
>
>  Table "public.webrecord"
>   Column   | Type |   Modifiers
> ---+--+-
> ---+--+---
>  id| integer  | not null default 
> nextval('webrecord_id_seq'::regclass)
>  categoria | character varying|
>  dominio   | character varying|
>  country   | character varying(5) |
> Indexes:
> "keywebrecord" PRIMARY KEY, btree (id)
> "dominio_idx" btree (dominio)
>
> By entering a simple query:
>
> Select * from webrecord where dominio='sito.com' it wait some seconds and cpu 
> raise up.
>
> I tried to:
> - delete the tabel
> - recreate the schema again
> - re-populating it
> - reindexing
> - vacuum
>
> But nothing happened.
> The query planne says it scans lots of rows:
>
> explain analyze select * from webrecord where dominio='crl.microsoft.com';
> QUERY PLAN
> --
>  Gather  (cost=1000.00..144045.21 rows=62073 width=92) (actual 
> time=1096.202..1096.202 rows=0 loops=1)
>Workers Planned: 4
>Workers Launched: 4
>->  Parallel Seq Scan on webrecord  (cost=0.00..136837.91 rows=15518 
> width=92) (actual time=1063.277..1063.277 rows=0 loops=5)
>  Filter: ((dominio)::text = 'crl.microsoft.com'::text)
>  Rows Removed by Filter: 2482938  Planning time: 0.119 ms  Execution 
> time: 1107.846 ms
>
> On other machine the results are immediate!
>
> I have finished my tries...
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why is table not found?

2017-01-31 Thread Alban Hertroys


> On 31 Jan 2017, at 14:45, Egon Frerich  wrote:
>> 
>> CREATE TABLE "Raum"
>> (
>>  "RaumID" serial NOT NULL, -- Automatisch vergebenes

> But psql tells me "Raum" is not existent:
> 
>> egon@xfEinzel ~ $ psql Hausrat
>> psql (9.3.15)
>> Type "help" for help.
>> 
>> Hausrat=# SELECT * FROM Raum;
>> ERROR:  relation "raum" does not exist
>> LINE 1: SELECT * FROM Raum;

Instead of adding quotes to your select statement (as others suggested), IMHO 
the better thing to do is to remove them from your create statements. Most of 
the time there is no benefit creating case-sensitive identifiers in a database.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres 9.6.1 big slowdown by upgrading 8.4.22

2017-01-07 Thread Alban Hertroys

> On 7 Jan 2017, at 15:44, Job  wrote:
> 
> Hi guys,
> 
> First of all excuse me but i really do not explain the problem, sorry...
> 
>>> Are you being serious? You're complaining about a "big slowdown" for a 
>>> query that goes from 1.5ms to 4ms?
>>> What is the actual problem you're trying to solve? Because I don't see one 
>>> in the above.
> 
> Single query if fast both in 8.4.22 and 9.6.1, no problem.
> 
> But the problem is not here!
> 
> The big problem is the benchmark before put the system under production.
> We launch about 100/200 queries per second and we monitor with "top" the two 
> machines.
> They are VM with 4 vCPU and 10Gb of RAM, with CentOS 7.2 64bit.
> 
> This is what it happens:
> 
> Postgres 8.4.22
> Medium average load 1.5/2.0
> Further queries respond very quickly
> 
> Postgres 9.6.1
> Medium average load 18.0/20.0 !!
> Further queries are really very slow
> There is a bottle neck

I see.

> By removing *only* this condition in the query function:
> "exists ( select 1 from gruorari where gruorari.idgrucate=grucategorie.id and 
> ( (('{'||gg_sett||'}')::int[] && array[EXTRACT(DOW FROM NOW())::int])='t' and 
>  now()::time between gruorari.dalle::time and gruorari.alle::time) )"

Then most likely the slow-down you're experiencing is indeed in the above 
subquery. It could also be the addition of the exists though, let's not rule 
that out!

Note that I'm not on either of the versions involved (9.3.15 here), so I can't 
easily observe what you're seeing.

A general observation; I think now() calls gettimeofday() each time, the 
performance of which can differ significantly depending on which hardware clock 
is being used by your OS (there are often multiple options). On the contrary, 
CURRENT_TIMESTAMP, CURRENT_TIME and friends are only updated at the start of 
the transaction, requiring but a single call to gettimeofday().
Judging from your queries, you don't actually seem to need the accuracy that 
NOW() provides…

The one-liner is a bit hard to read, btw - and so requires more effort from 
anyone trying to help.

A useful next step would be to benchmark the separate parts of that query:
1). where gruorari.idgrucate = grucategorie.id
2). and (('{'||gg_sett||'}')::int[] && array[EXTRACT(DOW FROM NOW())::int])='t'
3). and now()::time between gruorari.dalle::time and gruorari.alle::time
4). exists(select 1)

Ad 1). If there's any difference in performance between the 2 PG versions here, 
most likely it's a different plan for this condition. It might be as simple as 
a difference in statistics or number of rows.

Ad 2). This part seems a bit convoluted, which may be responsible for some of 
the overhead. Frankly, I'm not 100% certain of the purpose of that condition, 
but it appears that the gg_sett field contains a comma-separated list of days 
of the week that need to be matched to today's day of the week.
I rewrote it to:
extract(DOW from NOW()) = any('{' || gg_sett || '}'::int[])

Performance of either query on my 9.3 installation is pretty much the same, but 
I have only done some synthetic benchmarks:

=> explain analyze select '{6,7,0}'::int[] && array[extract(DOW from 
NOW())::int] from generate_series(1, 1);
  QUERY PLAN

---
 Function Scan on generate_series  (cost=0.00..20.00 rows=1000 width=0) (actual 
time=4.548..58.072 rows=1 loops=1)
 Total runtime: 77.116 ms
(2 rows)

=> explain analyze select extract(DOW from NOW()) = any('{6,7,0}'::int[]) from 
generate_series(1, 1);   
  QUERY PLAN

---
 Function Scan on generate_series  (cost=0.00..18.75 rows=1000 width=0) (actual 
time=4.341..48.902 rows=1 loops=1)
 Total runtime: 67.477 ms
(2 rows)

Ad 3). Casting fields in the where-clause is usually a bad idea. Unless you 
have indexes on those fields in which they are cast to time AND the query 
planner chooses to use those indexes, the type-cast will get applied to every 
candidate record each. If you have a million candidate records, that's 2x a 
million casts taking place (for two fields).
To say more about this we would need more details about what types those fields 
are and why you're casting them to time.

> The Postgres 9.6.1 machine average workload return at about 2.0/3.0

Re: [GENERAL] Postgres 9.6.1 big slowdown by upgrading 8.4.22

2017-01-06 Thread Alban Hertroys

> On 6 Jan 2017, at 16:56, Job  wrote:
> 
> W e use a function, the explain analyze is quite similar:
> POSTGRESQL 8.4.22:
>  Total runtime: 1.531 ms
>  
> POSTGRES 9.6.1:
>  Execution time: 4.230 ms

Are you being serious? You're complaining about a "big slowdown" for a query 
that goes from 1.5ms to 4ms?
What is the actual problem you're trying to solve? Because I don't see one in 
the above.

Just saying, you're obviously worried about something, but should you be? 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Special index for "like"-based query

2016-12-30 Thread Alban Hertroys

> On 30 Dec 2016, at 11:42, Job  wrote:
> 
>>> And, basically, if you need help with some queries you could try
>>> posting them whole, even redacted, along the table defs, this way
>>> perople can see the problem and not invent one based on a partial
>>> description
> 
> Thank you very much, very kind from you.
> 
> The index applied on the timestamp field is a btree("timestamp")
> 
> The query is:
> 
> select domain, sum(accessi) as c_count from TABLE where action='1' AND 
> profile IN ('PROFILE_CODE') AND timestamp::date  BETWEEN '2016-12-27' AND 
> '2016-12-30' AND timestamp::time BETWEEN '00:00:00' AND '23:59:59' GROUP BY 
> domain ORDER BY c_count DESC  LIMIT 101
> 
> The table format is:
>  Column   |   Type   |   
> Modifiers
> ---+--+
> id| numeric(1000,1)  | not null default 
> function_get_next_sequence('webtraffic_archive_id_seq'::text)
> timestamp | timestamp with time zone |
> domain| character varying(255)   |
> action| character varying(5) |
> profile   | character varying|
> accessi   | bigint   |
> url   | text |
> 
> Indexes:
>"webtraffic_archive_day_2016_04_15_action_wbidx" btree (action)
>"webtraffic_archive_day_2016_04_15_domain_wbidx" btree (domain)
>"webtraffic_archive_day_2016_04_15_profile_wbidx" btree (profile) CLUSTER
>"webtraffic_archive_day_2016_04_15_timestamp_wbidx" btree ("timestamp")
> 
> Last question: the table is partitioned. I need to manually create index for 
> every sub-tables or there is a way to create on every sub-tables once?

It's usually more efficient to cast the constants you're comparing to, than to 
cast a field value for each record in the set. The exception to that is when 
you have an index on the casted field.

In your case, since you're casting to date and time separately, and whole days 
even, it's probably more efficient to combine that into:

… AND timestamp BETWEEN '2016-12-27 00:00:00'::timestamp with time zone AND 
'2016-12-30 23:59:59'::timestamp with time zone ...

But even then, you're excluding items that fall in the second between the end 
date and the next day. The new range types are useful there, for example:

… AND timestamp <@ '[2016-12-27 00:00:00, 2016-12-31 00:00:00)'::tsrange

The above isn't entirely correct, as tsrange uses timestamp without time zone, 
but you get the gist.

However, if those time ranges can have other values than '[00:00. 23:59]', then 
you probably need 2 indexes on that timestamp column; one cast to date and one 
to time. Otherwise, you end up creating timestamp range filters for each day in 
the range in the query (which could still be the better approach).

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LYDB: Feasible to use PG roles instead of application-level security?

2016-12-30 Thread Alban Hertroys

> On 30 Dec 2016, at 4:19, Guyren Howe  wrote:
> 
> Further to my attempts to enlighten application developers about what they 
> might better do in the database:
> 
> https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb
> 
> it occurs to me to wonder whether it is practical to use PG’s own roles and 
> security model in lieu of using an application-level one.
> 
> It seems that the role system in PG is sufficient for most general purposes. 
> One could presumably also have a table with role names and associated 
> metainformation (email address etc) as needed.
> 
> If I have a system with many thousands of users, is it practical to manage 
> these users’ authentication and authorization using *just* Postgres?

Postgres roles are global to the cluster, so you would end up with multiple 
thousands of roles if you have multiple databases in your cluster with 
different users on each. Which roles each user is allowed to have becomes quite 
the nightmare for the administrators, I suspect.

For a web-application facing the internet, I'd say no, don't do that. You're 
dealing with far too many users to be maintainable.

For an intranet database in a not-too-large company with a fixed set of users, 
it could be a good solution, especially if those roles can be linked to the 
company's LDAP server (assuming that's possible, I don't know). Multiple 
intranet applications on that same database can use the same users and roles.

Someone needs to do the administration though; with volumes (of users) like 
that and the database knowledge level of the average system administrator, a 
GUI seems preferable. IMHO, pgadmin provides too many features to be practical 
for someone like that, you would probably prefer something that only does user 
administration. I don't know of anything that does that though (not a GUI user 
myself)...

> It occurs to me that some client frameworks might have issues with their 
> connection pools if those connections keep switching users, assuming they 
> even can, but let’s set that aside for now. Or perhaps every connection could 
> immediately do a SET USER before executing its connection?
> 
> This seems an attractive proposition from a security standpoint: if I use 
> row-level security pervasively, I can have a security system that’s nestled 
> nice and close to the data and presumably tricky to work around from a hacker 
> given direct access only to the client application.

With a few changes, that could work very well.

First, create roles for the different types of users that you expect. In a 
company, that could be by division, distinguishing division-heads, interns, etc.

Secondly, have a table with the users and their attributes like you describe. 
Include an attribute for their database role there. Only administrator users 
should have access to that table.

Finally, create a stored procedure that looks up a user name in that table and 
sets the accompanying role. If a user is not found, set the role to some 
default 'unprivileged' user.
Make that procedure a SECURITY DEFINER with according permissions. That role 
stays active the entire session, so unless you close the connection, create a 
new one or change the user's role, this procedure doesn't need calling again.

> Is this practical? Has anyone here done it? What might the caveats be?

It's a fairly common practice, the ML archives should contain plenty of 
examples.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] LYDB: What advice about stored procedures and other server side code?

2016-12-27 Thread Alban Hertroys

> On 27 Dec 2016, at 23:03, Guyren Howe  wrote:
> 
> I am putting together some advice for developers about getting the most out 
> of SQL servers in general and Postgres in particular. I have in mind the 
> likes of most web developers, who through ignorance or a strange cultural 
> preference that has emerged, tend to treat their database server as a dumb 
> data bucket.
> 
> I call the project Love Your Database (LYDB). It is starting as a series of 
> blog posts:
> 
> https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb
> https://medium.com/@gisborne/love-your-database-simple-validations-68d5d6d0bbf3#.az4o2s152
> 
> I would next like to cover server-side code such as stored procedures and 
> triggers.

I think there's an objection that most developers would come up with: Namely 
that stored procedures living in the central relational database can usually 
not be changed as easily as client-side code, without disrupting development 
(or worse, operation of the production system) for other users. I think that 
needs some thought - I don't have the answer to that one.

IMO, as long as those stored procedures are part of the model (ensuring data 
integrity, interpreting database values, etc.), there shouldn't be a problem, 
but it's a terrain I haven't trodden much. There are things you usually don't 
want to handle at the database side, such as firing off other applications to 
perform tasks or user interface related operations, things like that. I once 
was tasked with writing a library on a database server (Oracle 8i) to handle 
printing of data and sending out faxes etc. That was a nightmare and in 
hindsight it just doesn't belong in a database. That's a task for middleware. 
Where the dividing line is is probably an interesting topic.

Another thought related to seeing the database as a model server; since we're 
apparently in the model-view-controller paradigm, it seems to me that database 
views would probably be useful to the view-part of MVC. That's just something 
that struck me to make more than a bit of sense…
Of course, for the actual view in the MVC paradigm there should be some kind of 
user interface, but database views could be really useful in preparing the data 
required for those, to make it fit the shape of the view.

So far I like what you're writing, so keep it coming!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL]

2016-12-21 Thread Alban Hertroys

On 21 December 2016 at 12:51, Yogesh Sharma  wrote:
> Dear Alban,
>
> In my production system, there are lot of read write operation performed
> every hour.

That does not explain why you're performing REINDEX on that system;
rather, it explains why you should NOT be performing REINDEX.

VACUUM is what you should be using.

You can VACUUM either using autovacuum with a sufficiently high
frequency that it doesn't fall behind too much,
or by manually running VACUUM ANALYSE on specific tables after batch operations.
Which choice suits you best depends on your workload.

In batches it is often also useful to run (plain) ANALYSE every so
often on the relevant tables to update the statistics, but only when
the batch process is not insert-only.

> So, i am thinking, if i can add check during REINDEX operation nothing
> update and insert operation performed.
> Is it possible?

I have no idea what you're saying.

> On Wednesday, December 21, 2016, Alban Hertroys  wrote:
>>
>> On 21 December 2016 at 09:59, Yogesh Sharma  wrote:
>> > Also, every hour,i am performing VACUUM and REINDEX operation on table.
>>
>> Why are you running REINDEX every hour? That's a very unusual thing to
>> do, you'd need a pretty good reason for that.
>>
>> --
>> If you can't see the forest for the trees,
>> Cut the trees and you'll see there is no forest.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL]

2016-12-21 Thread Alban Hertroys

On 21 December 2016 at 09:59, Yogesh Sharma  wrote:
> Also, every hour,i am performing VACUUM and REINDEX operation on table.

Why are you running REINDEX every hour? That's a very unusual thing to
do, you'd need a pretty good reason for that.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Is there a way to Send attachments with email using pgmail postgreSQl?

2016-12-19 Thread Alban Hertroys

> On 19 Dec 2016, at 18:55, Adrian Klaver  wrote:
> 
> On 12/15/2016 10:07 PM, nidhi raina wrote:
>> Dear Sir/Mam,
>> 
>> I am also trying to send emails with attachments.please help me out with
> 
> If as your subject suggests you are using this:
> 
> https://github.com/captbrando/pgMail
> 
> then it is not possible to send attachments:
> 
> "Essentially, pgMail is simply a stored function written in TCL
> which takes 4 arguments of type 'text' (Who is it from, who is it
> to, subject, and body of message), contacts the email server via
> TCL sockets, and transmits your email (Now UTF-8 Compatible!).
> "

Not even when you create the message with the correct MIME-headers and 
base64-encode your attachment within the correct MIME-section? That's a fair 
amount of code to write, but a stored procedure in, say TCL, should be able to 
handle that. Or am I missing something?

Whether it's a good idea to let the database encode attachments and send 
e-mails is a different matter, but if it isn't doing much beside that - well, 
why not?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] huge table occupation after updates

2016-12-10 Thread Alban Hertroys

Please use a readable font. Your messages are using a font that's so small that 
my eyes start to hurt. I still try to read them, but I - and I assume others - 
will stop trying if you keep this up.

Sorry for the top-post, but since it's not directly appropriate to the topic 
that's perhaps for the better.

> On 10 Dec 2016, at 19:15, Tom DalPozzo  wrote:
> 
> 
> 
> 2016-12-10 18:30 GMT+01:00 Francisco Olarte :
> A couple of things first.
> 
> 1.- This list encourages inline replying, editing the text, and frowns
> upon top posting.
> 
> 2.- Your HTML formatting with so a small size makes it harder for me (
> and I can assume some others ) to properly read your messages.
> 
> If you want to discourage people replying to you, keep doing the two above.
> 
> On Sat, Dec 10, 2016 at 3:25 PM, Tom DalPozzo  wrote:
> > you're right, VACUUM FULL  recovered the space, completely.
> 
> Well, it always does. ;-)
> 
> > So, at this point I'm worried about my needs.
> > I cannot issue vacuum full as I read it locks the table.
> 
> Well, first hint of your needs. Bear in mind vacuum fulls can be very
> fast on small tables ( i.e, if you have the 1.5Mb table, and do 2000
> updates and then a vacuum full that will be very fast, time it ).
> 
> > In my DB, I (would) need to have a table with one bigint id field+ 10 bytea
> > fields, 100 bytes long each (more or less, not fixed).
> > 5/1 rows maximum, but let's say 5000.
> > As traffic I can suppose 1 updates per row per day (spread over groups
> > of hours; each update involving two of those fields, randomly.
> > Also rows are chosen randomly (in my test I used a block of 2000 just to try
> > one possibility).
> > So, it's a total of 50 millions updates per day, hence (50millions * 100
> > bytes *2 fields updated) 10Gbytes net per day.
> 
> Not at all. That's the volume of updated data, you must multiply by
> the ROW size, not just the changed size, in your case 50M * 1100 ( to
> have some wiggle room ), 55Gbytes.
> 
> But this is the UPPER BOUND you asked for. Not the real one.
> 
> > I'm afraid it's not possible, according to my results.
> 
> It certaninly is. You can set a very aggresive autovacuum schedule for
> the table, or even better, you may vacuum AFTER each hourly update.
> This will mark dead tuples for reuse. It will not be as fast, but it
> can certainly be fast enough.
> 
> And, if you only update once an hour, you may try other tricks ( like
> copy to a temp table, truncate the original and insert the temp table
> in the original, although I fear this will lock the table too, but it
> will be a very short time, your readers may well tolerate it. )
> 
> Yours seem a special app with special need, try a few, measure, it is
> certainly possible.
> 
> Francisco Olarte.
> 
> Hi, I think you're right. I was surprised by the huge size of the tables in 
> my tests but I had not considered the vacuum properly.
> My test had a really huge activity so perhaps the autovacuum didn't have time 
> to make the rows reusable.
> Also, issuing plain VACUUM command does nothing visibile at once, but only 
> after when, inserting new rows, the size doesn't increase.
> I will try again as you suggest.
> Thank you very much
> Pupillo

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Upgrade from 9.5.4 to 9.6.1

2016-11-16 Thread Alban Hertroys

On 16 November 2016 at 16:33, Rich Shepard  wrote:
> On Tue, 15 Nov 2016, Rich Shepard wrote:

>   If 9.6.1 is currently running after running initdb, and I can access my
> databases, what does pg_upgrade do that's necessary?

pg_upgrade migrates your databases from your old (9.5) cluster to the
new (9.6) one. Initdb doesn't do that.

If your 9.6 database does indeed contain your databases, then
something must have done the pg_upgrade for you. Perhaps the slackware
package script does something like that, but that would cause problems
for people who do _not_ want to migrate their databases, so I doubt
that's what happened.

What I think what happened is that you are using the new pg 9.6 psql
binary to list the databases in your old 9.5 cluster.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Recover from corrupted database due to failing disk

2016-11-04 Thread Alban Hertroys

On 4 November 2016 at 11:20, Gionatan Danti  wrote:
> Unfortuntaly I am working with incredible constrains from customer side;
> even buying two SAS disks seems a problem. Moreover, as an external
> consultant, I have basically no decision/buying power :|
> What I can do (and I did) is to raise a very big red flag and let others
> decide what to do.

It seems to me that your customer doesn't realise how expensive it
would be if their server would be unavailable for any length of time
or if they would actually lose the data it contains. That, or the data
of your customer isn't so valuable that it's worth your time.

We've been fighting a somewhat similar fight internally here, where
management wasn't prepared to spend € 30,000 once on a server plus
software licenses, while they pay that to one of our new managers
monthly.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-11-04 Thread Alban Hertroys

On 4 November 2016 at 14:41, Merlin Moncure  wrote:
> On Fri, Nov 4, 2016 at 8:08 AM, Kim Rose Carlsen  wrote:
>> The nulls are generated by something like this
>> SELECT c.circuit_id,
>>cc.customer_id
>>FROM circuit AS c
>> LEFT JOIN circuit_customer AS cc
>>  ON c.circuit_id = cc.circuit_id
>>
>> To make a magic '0' customer we would be required to use
>>   COALESCE(cc.customer_id, '0')
>> I dont think the optimizer will do anything clever with the '0' we have
>> computed from null.
>
> It would if you explicitly indexed it as such;
> CREATE INDEX ON circuit_customer((COALESCE(customer_id, '0'));

Merlin, it's a LEFT JOIN. There probably are no NULLs in the
circuit_customer.customer_id column, so that COALESCE isn't going to
achieve anything at all.

I haven't been following this particular discussion in detail, so
unfortunately I can't contribute more than that remark at the moment.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] initdb createuser commands

2016-10-31 Thread Alban Hertroys

On 31 October 2016 at 15:50, Christofer C. Bell  wrote:

>
> I think the OP's point is that having a hodgepodge of (on their face)
> unrelated commands smells kinda unorganized at best and unprofessional at
> worst.  Wether or not he's right is up to the reader.  For me, I agree with
> his sentiment.
> The solution he's suggesting is to bring all of these commands under one
> umbrella either by bundling them in an administrative utility or by giving
> them a prefix that shows they're related to "the PostgreSQL database."
> He's getting a lot of pushback that really feels it's coming from the
> wrong direction.  "Just learn it."  "It's always been this way."  "No one
> agrees with you."  These arguments are unconvincing.  That said, there's
> nothing wrong with just saying, "we're not going to change it because we
> don't want to."
>
>
There is the issue that by introducing new commands that are better
organised, the new user will get introduced to more commands instead of
fewer - when they run into a problem or if they bought the book, the
commands they'll encounter will be the "old" commands.

There's also the learning curve of having a single wrapper-command that can
do anything pg-related. The purpose of a command named pg_createuser is
obvious, the purpose of a command named pg or pga is not so obvious.

Personally, I sometimes work with Firebird for educational purposes and I
can't make heads or tails of their command-line tools (with the exception
of isql, but only when I remember it was based on Interbase). To me, the pg
tools are much easier to remember, even though their naming isn't always
consistent.

I do think however that having the pg-commands prefixed with pg_ is
actually helpful to both new and experienced users. One reason is that it
limits the number of commands matched for command completion after typeing
pg_ (which is only 3 characters to type). ISTR some argument against using
underscores because they would be hard to type, but I can't understand why.

That said, renaming the commands provides a rather minor benefit at best.
Having this much fuss about it is out of proportion IMHO. I remember
learning those commands (when pg 7.4.7 was a big deal) and it certainly did
not cost me the majority of time that I needed to learn to use pg, and once
I did learn them I knew where to find at least the documentation.

My few cents.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: [GENERAL] Rows are repeating by the trigger function

2016-10-31 Thread Alban Hertroys

On 31 October 2016 at 14:41, Adrian Klaver  wrote:
> On 10/31/2016 02:06 AM, Kiran wrote:
>> I know 94 = 1 + (3 * 31).
>> I am just having a normal insert statement into cf_question table.
>
> Are there any other triggers on the tables?

I'm fairly confident that the duplicates are from updates on the
cf_question table. Since the trigger also fires on update and then
inserts another record, that would explain the duplication pretty
well.

Nevertheless, if there are other triggers those bear investigation.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Way to quickly detect if database tables/columns/etc. were modified?

2016-10-30 Thread Alban Hertroys

> On 30 Oct 2016, at 10:45, Evan Martin  wrote:
> 
> If I have a query that reads from system tables like pg_class, pg_namespace, 
> pg_attribute, pg_type, etc. and I'd like to cache the results in my 
> application is there any fast way to detect when any changes have been made 
> to these system catalogs? I don't  need to know exactly what has changed. 
> Some kind of a global "database version" would do, just so I know that I need 
> to invalidate my cache (the database definition is rarely modified in 
> practice).

I think the usual practice for such situations is to do database changes 
through SQL scripts[1] that are under version control. Since they are under VC, 
you can automatically write the version[2] into the SQL script on commit of 
changes to said script through a commit hook.
That version in the SQL script can then be used in an UPDATE statement to some 
database-global settings table[3].

And there you have your database version.

Ad 1. Never do changes directly in the database when you go this route!
Ad 2. Those are often hashes these days.
Ad 3. You could even have the UPDATE statement be automatically added by the 
commit hook of your VC of choice.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Rows are repeating by the trigger function

2016-10-30 Thread Alban Hertroys


> On 30 Oct 2016, at 10:31, Kiran  wrote:
> 
> Dear Folks,
> 
> I have a  table cf_question with 31 rows.
> I want to insert/update another table cf_user_question_link  when cf_question 
> table is inserted/updated with row(s). 
> I have written trigger function for this as follows. 
> 
> 
>   CREATE FUNCTION user_question_link() RETURNS trigger AS
>   $user_question_link$
>   begin
>   SET search_path TO monolith;
>INSERT INTO
>cf_user_question_link(cf_user_id,cf_question_id)
>VALUES(NEW.user_id,NEW.cf_question_id);
>   RETURN NEW;
>   end;
>   $user_question_link$
>   LANGUAGE plpgsql 
>   COST 100;
> 
> 
> /* Call the trigger function */
> 
>   CREATE TRIGGER insert_user_question_link AFTER INSERT OR UPDATE
>   ON monolith.cf_question
>   FOR EACH ROW EXECUTE PROCEDURE user_question_link();
> 
> 
> Problem: The cf_user_question_link gets inserted with 94 rows instead of 31 
> rows. The 31 rows are repeated 3 times 
>  I tried dropping the trigger function and recreating it but 
> with the same 94 rows in the table.
> 
> It would be great if any from the forum point to me where I am doing wrong.

I don't think you want that same trigger to fire on UPDATE of cf_question, like 
you do now.

On UPDATE you have two choices;
- either you need to take changes to those _id columns into account and delete 
rows that belong to the OLD link and not to the NEW one (or do nothing if those 
stayed the same)
- or you do nothing (no trigger needed) because in the majority of cases 
changing FK's is limited to a few power users at best and they're supposed to 
know what they're doing.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] WHERE ... IN condition and multiple columns in subquery

2016-10-28 Thread Alban Hertroys

On 28 October 2016 at 13:03, Alexander Farber
 wrote:
> Hello,
>
> is it please possible to rewrite the SQL query
>
> SELECT DISTINCT ON (uid)
> uid,
> female,
> given,
> photo,
> place
> FROM words_social
> WHERE uid IN (SELECT player1 FROM games)
> OR uid IN (SELECT player2 FROM games)
> ORDER BY uid, stamp DESC
>
> where first column player1 is fetched in a subquery and then column player2
> is fetched from the same table?
>
> I've searched around and it seems that a JOIN should be used here, but can
> not figure out exactly how.
>
> Thank you
> Alex

You mean like this?:

 SELECT DISTINCT ON (uid)
 uid,
 female,
 given,
 photo,
 place
 FROM words_social
 JOIN games ON uid IN (player1, player2)
 ORDER BY uid, stamp DESC


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] update records to have occurance number

2016-10-25 Thread Alban Hertroys


> On 25 Oct 2016, at 17:06, Mark Lybarger  wrote:
> 
> I want to update a table to have the value of the occurrence number.  For 
> instance, I have the below table.  I want to update the number column to 
> increment the count of last name occurrences, so that it looks like this:
> 
> first last1
> secondlast2
> third last3
> first other   1
> next  other2
> 
> Here's my simple table:
> 
> create table person ( fname text, lname text, number integer);
> 
> insert into person (fname, lname) values ('first', 'last');
> insert into person (fname, lname) values ('second', 'last');
> insert into person (fname, lname) values ('third', 'last');
> 
> insert into person (fname, lname) values ('first', 'other');
> insert into person (fname, lname) values ('next', 'other');
> 
> How would I issue an update statement to update the number column?

That depends on which order you want the database to perceive those rows in.
The above example suggests that alphabetical order on fname might work, in 
which case:

update person set number = count(p2.fname) +1
  from person p2
 where p2.lname = person.lname
   and p2.fname < person.fname;

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] psql 9.3 automatic recovery in progress

2016-10-10 Thread Alban Hertroys


> On 10 Oct 2016, at 21:43, Periko Support  wrote:
> 
> For the life time in odoo session, can u point me where I can manage that 
> setting?
> 
> The configuration /etc/openerp-server.conf doesn't have any parameter for 
> that.
> 
> That must be in a odoo file...?
> 
> Thanks.
> 
> On Mon, Oct 10, 2016 at 12:25 PM, Pavel Stehule  
> wrote:
> 
> 
> 2016-10-10 21:12 GMT+02:00 Periko Support :
> Andreo u got a good observation here.
> 
> I got a script that run every hour why?
> 
> Odoo got some issues with IDLE connections, if we don't check our current 
> psql connections after a while the system eat all connections and a lot of 
> them are IDLE and stop answering users, we create a script that runs every 
> hour, this is:

That's all part of Odoo (formerly known as OpenERP), isn't it? Did you contact 
them about this behaviour yet? Might just be that they're familiar with the 
problem and have a solution for it.

I suspect the Python script you're running was implemented as a rather rough 
workaround by people from allianzgrp who knew just enough to be harmful. (Kill 
-9 on a database process, jeez! Keyboards should have an electroshock feature 
for people like that…)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] psql 9.3 automatic recovery in progress

2016-10-10 Thread Alban Hertroys


> On 10 Oct 2016, at 21:28, Alban Hertroys  wrote:
> 
> 
>> On 10 Oct 2016, at 21:12, Periko Support  wrote:
>> 
>>for pid in idle_record:
>>try:
>> #print "process details",pid
>> #os.system("kill -9 %s" % (int(pid[0]), ))
>>os.kill(int(pid[0]), signal.SIGKILL)
>>except OSError as ex:
>>continue
> 
> That query returns PostgreSQL backends and you're sending them SIGKILL. Not a 
> recommended practice far as I know. Shouldn't you rather be sending those 
> kill signals to the clients connecting to the db?
> Worse, apparently at some time in the past (a month ago matching those logs, 
> perhaps?) it used to send kill -9! That's absolutely a very bad idea.
> 
> While on the topic, there is a PG function to cancel a backend query from 
> within PG: https://www.postgresql.org/docs/9.5/static/functions-admin.html
> I think that's the best way to go about this, and best of all, you can 
> combine that with your select statement.

Another idea struck me; if that script is under version control, you can check 
when that change was committed. If it isn't, perhaps you should. My current 
favourite is Hg (aka Mercurial), which happens to be written in Python, just 
like your script.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] psql 9.3 automatic recovery in progress

2016-10-10 Thread Alban Hertroys

> On 10 Oct 2016, at 21:12, Periko Support  wrote:
> 
> for pid in idle_record:
> try:
> #print "process details",pid
> #os.system("kill -9 %s" % (int(pid[0]), ))
> os.kill(int(pid[0]), signal.SIGKILL)
> except OSError as ex:
> continue

That query returns PostgreSQL backends and you're sending them SIGKILL. Not a 
recommended practice far as I know. Shouldn't you rather be sending those kill 
signals to the clients connecting to the db?
Worse, apparently at some time in the past (a month ago matching those logs, 
perhaps?) it used to send kill -9! That's absolutely a very bad idea.

While on the topic, there is a PG function to cancel a backend query from 
within PG: https://www.postgresql.org/docs/9.5/static/functions-admin.html
I think that's the best way to go about this, and best of all, you can combine 
that with your select statement.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to retrieve jsonb column through JDBC

2016-08-29 Thread Alban Hertroys


> On 29 Aug 2016, at 20:23, Alexander Farber  wrote:
> On Mon, Aug 29, 2016 at 7:50 PM, Merlin Moncure  wrote:
> On Sat, Aug 27, 2016 at 5:39 AM, Alexander Farber
>  wrote:
> >
> >  List last_tiles = (List) JSON.parse(rs.getString("last_tiles"));
> >
> > has not work for me even though the string is:
> >
> >  [{"col": 7, "row": 8, "value": 1, "letter": "A"}, {"col": 7, "row": 7,
> > "value": 2, "letter": "B"}, {"col": 7, "row": 9, "value": 2, "letter": "C"}]
> >
> > but it is probably the problem of the Jetty class I am using and not of
> > JDBC...
> 
> huh.  what exactly is failing? are you getting a parse exception?
> 
> http://download.eclipse.org/jetty/9.3.11.v20160721/apidocs/org/eclipse/jetty/util/ajax/JSON.html#parse-java.lang.String-
> 
> fails with:
> 
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to 
> java.util.List

I'm not 100% sure it's the root of the ClassCastException here, but I'm pretty 
sure that Java will want to know what class of items that List consists of.
If that doesn't ring a bell for you, spend some time reading about "Java 
generic classes" (not to be confused with "general Java classes").

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] select array_remove(ARRAY[NULL,NULL,NULL],NULL); returns {} instead of {NULL,NULL,NULL}

2016-08-08 Thread Alban Hertroys


> On 08 Aug 2016, at 20:19, Tom Lane  wrote:
> 
> Alexander Farber  writes:
>> I wonder, why the following returns NULL and not 0 in 9.5.3?
> 
>> # select array_length(array_remove(ARRAY[NULL,NULL,NULL],NULL), 1);
> 
> Because the result of the array_remove is an empty array, which is
> defined to be zero-dimensional in PG.

Reading this, I'm a bit confused about why:
select array_remove(ARRAY[NULL, NULL, NULL], NULL);

Results in:

 array_remove 
--
 {}
(1 row)

How does it now which unknown value to remove from that array of unknown 
values? Shouldn't the result be:
{NULL,NULL,NULL}?

(Sorry for sort-of hijacking this thread)

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: [GENERAL] RE: [GENERAL] Re: [GENERAL] A simple extension immitating pg_notify‏

2016-07-26 Thread Alban Hertroys

> On 26 Jul 2016, at 2:52, Mehran Ziadloo  wrote:
> 
> Sorry if my terminology is not accurate. But by an instance, I mean a 
> PostgreSQL
> installation. And I call it an instance (and not a database) not to confuse it
> with the concept of databases (as in databases / schemas). Even when I'm 
> trying
> to clarify the terminology, it's hard due to lack of distinguishable words!
> 
> And here, I'm not talking about the cluster version of PostgreSQL. Simple, old
> fashion PostgreSQL will do.

Nobody here is talking about some clustered version of PG.

What you call an 'instance' (or 'installation') is called a cluster. A cluster 
contains databases, which contain schema's etc. In database terms, a cluster is 
a single database server, a single 'installation' in your terms.

If you would install multiple PG servers in separate directories, running on 
separate port numbers, you would have multiple clusters. Same if you distribute 
those servers over several hosts, what you seem to think a cluster means. That 
is the difference between a cluster of databases and a cluster of servers.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Help needed structuring Postgresql correlation query

2016-06-21 Thread Alban Hertroys


> On 19 Jun 2016, at 10:58, Tim Smith  wrote:
> 
> Hi,
> 
> My postgresql-fu is not good enough to write a query to achieve this
> (some may well say r is a better suited tool to achieve this !).
> 
> I need to calculate what I would call a correlation window on a time
> series of data, my table looks like this :
> 
> create table data(data_date date,data_measurement numeric);
> insert into data values('2016-01-01',16.23);
> 
> insert into data values('2016-06-19',30.54);
> 
> My "target sample" would be the N most recent samples in the table
> (e.g. 20, the most recent 20 days)
> 
> My "potential sample" would be a moving window of size N (the same
> size N as above), starting at T0 (i.e. 2016-01-01 in this example) and
> incrementing by one (i.e. 2016-01-01 + 20, then 2016-01-02+20 etc),
> but the "target sample" would obviously be excluded.
> 
> The output needs to display window date range (or at least the start
> date of the "potential sample" window) and the result
> corr(target,potential).
> 
> Hope that makes sense

Something like this could do the trick (untested):

with recursive sample (nr, start_date) as (
select 1 as nr, data_date as start_date, SUM(data_measurement) 
as total
from generate_series(0, 19) range(step)
left join data on (data_date = start_date + range.step)

union all

select nr + 1, sample.start_date +1, SUM(data_measurement) as 
total
from sample
join generate_series(0, 19) range(step)
left join data on (data_date = start_date +1 + range.step)
where start_date +1 +19 <= (select MAX(data_date) from data)
group by 1, 2
)
select * from sample where start_date >= '2016-01-01';

Not sure how best to go about parameterising sample size N, a stored function 
seems like a good option.


Another approach would be to move a (cumulative) window-function with 20 items 
over your data set and for each row subtract the first value of the previous 
window from the total of the current window (that is, assuming you're 
calculating a SUM of data_measurement for each window of 20 records).

Visually that looks something like this for sample size 4:
sample 1: (A + B + C + D)
sample 2: (A + B + C + D) + E - A = (B + C + D + E)
sample 3: (B + C + D + E) + F - B = (C + D + E + F)
etc.

To accomplish this, you calculate two cumulative totals (often misnamed as 
running totals, but AFAIK that's something different), one from the start, and 
one lagging N rows behind (you can use the lag() window function for that) and 
subtract the two.

Good luck!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sequences, txids, and serial order of transactions

2016-06-12 Thread Alban Hertroys

> On 12 Jun 2016, at 4:03, Christian Ohler  wrote:

> we have a use case similar to auditing packages like pgMemento or Audit 
> Trigger 91plus – we are looking to keep an ordered history of certain write 
> transactions.  I'm trying to understand the trade-offs between different ways 
> of getting that order, i.e., assigning numbers to transactions (ideally 
> strictly monotonic, modulo concurrency).  All of our transactions are 
> serializable (for now).

> (2) the orders produced by txid_current and a sequence can be different 
> (unsurprisingly).  (If it was desirable to make them match, we could probably 
> do so by briefly holding a lock while we call both txid_current and nextval – 
> seems like this shouldn't limit concurrency too much.  Or would it?  Is one 
> of them potentially slow?)

I'm aware of only 2 cases that those can have a different order:
1. The txid or the sequence wraps
2. The txid of a transaction exists some time already when the sequence's 
nextval() gets called. A later transaction (higher txid) running in parallel 
could request a nextval() in between those moments.

I think that situation 1 can be caught (the few times it occurs). Situation 2 
is probably what bothers you? As long as the request for nextval() is early in 
the transaction, a wait-lock shouldn't block other waiting transactions for 
long.

To make sure, I would run some tests comparing running enough parallel 
transactions calling a sequence's nextval() both with and without the lock. The 
first of those will also give you some insight in how bad the transaction 
ordering vs. sequence ordering problem actually is.
That is, unless you're perhaps overcomplicating your problem (see my answer to 
(6)).

> (5) Postgres can give us a "high watermark" ("no transactions with IDs below 
> this number are still in-flight") for txid_current (using 
> txid_snapshot_xmin(txid_current_snapshot())), but has no equivalent feature 
> for sequences

How would it know whether a sequence number is still in use? For example, I 
have a process @work where I use a database sequence to distinguish between 
batches of data in a user's HTTP session. Nothing of that is in the database, 
but the sequence is most certainly in use, across different database sessions.

> (6) neither txid_current nor a sequence give us a valid serial order of the 
> transactions

That depends on what you consider a transaction for your application. Do you 
care about the order that data got manipulated in, or do you care in what order 
the surrounding database transactions were created?
Usually, people only care about the first, for which a sequence should be just 
fine. The second is usually only relevant for systems that are closely tied to 
the database internals, such as replication systems.

> (7) given that we can't get a valid serial order, what guarantees can we get 
> from the ordering?  I'm not entirely sure what to look for, but at a minimum, 
> it seems like we want writes that clobber each other to be correctly ordered. 
>  Are they, for both txid_current and for sequences?  My guess was "yes" for 
> txids (seems intuitive but just a guess) and "no" for sequences (because 
> https://www.postgresql.org/docs/current/static/functions-sequence.html 
> mentions that sequences are non-transactional); but for sequences, I couldn't 
> immediately construct a counterexample and am wondering whether that's by 
> design.  Specifically, it seems that Postgres acquires the snapshot for the 
> transaction (if it hasn't already) when I call nextval(), and as long as the 
> snapshot is acquired before the sequence is incremented, I suspect that this 
> guarantees ordering writes.  Does it?

As I understand it, sequences have to be non-transactional to be able to 
guarantee correct ordering.

Calling nextval() will increment the sequence, but does not relate it to the 
transaction at that point. The select statement that does the call to nextval() 
receives the value from the sequence and is part of the transaction. That links 
them together, as long as you don't use that sequence value outside that 
transaction.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full text search index

2016-05-26 Thread Alban Hertroys

On 26 May 2016 at 06:04, Patrick Baker  wrote:
> Hi there,
>
> I've got the following query:
>>
>>
>> SELECT COUNT(DISTINCT j0_.id) AS sclr10
>> FROM customers j0_
>> WHERE ((LOWER(j0_.name_first) LIKE '%some%'
>> OR LOWER(j0_.name_last) LIKE '%some%')
>>AND j0_.id = 5)
>>   AND j0_.id = 5
>
>
> The query is taking ages to run.

Your guess is as good as ours without knowing what query plan the
database decided on. Post the output of explain analyze.

> I read about wildcards and it seems I have to use a function with
> to_tsvector ?

In general, you have to use an expression of which the query planner
can see that it's equivalent to the expression used in the index.
Otherwise the query planner has no way of knowing whether the index is
suitable for the query and it won't use the index.

>> CREATE INDEX CONCURRENTLY ON public.customers USING gin ("clientid",
>> ("full_text_universal_cast"("name_first"::"text")),
>> ("full_text_universal_cast"("name_last"::"text")));

In your case, you should query on full_text_universal_cast(your_field)
instead of on like '%some%'.

Alternatively, if your query always uses the sanme wildcard expression
you could create indexes on your_field like '%some%'.

> full_text_universal_cast:
>>
>> CREATE OR REPLACE FUNCTION public.full_text_universal_cast(doc_data
>> "text")
>>   RETURNS "tsvector" AS
>> $BODY$
>> SELECT to_tsvector('english', COALESCE(TRIM(CAST(doc_data AS TEXT)), ''));
>> $BODY$
>>   LANGUAGE sql IMMUTABLE
>>   COST 1000;

The query planner has no way of knowing what this function does
internally, so it certainly won't match the function results in the
index up with your like expression.

Regards,

Alban Hertroys
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Function PostgreSQL 9.2

2016-05-05 Thread Alban Hertroys


> On 05 May 2016, at 8:42, drum.lu...@gmail.com wrote:

> The final function code is:
> 
> CREATE OR REPLACE FUNCTION users_code_seq()
>RETURNS "trigger" AS $$
> DECLARE code character varying;
> BEGIN
> IF NEW.code IS NULL THEN
> SELECT client_code_increment INTO STRICT NEW.code FROM 
> public.companies WHERE id = NEW.id ORDER BY client_code_increment DESC;

^^^
There's your problem. I'm pretty sure the keyword STRICT isn't valid there. It 
probably gets interpreted as a column name.

> END IF;
> IF (TG_OP = 'INSERT') THEN
> UPDATE public.companies SET client_code_increment = 
> (client_code_increment + 1) WHERE id = NEW.id;
> END IF;
> RETURN NEW;
> END;
> $$ LANGUAGE plpgsql;

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Thoughts on "Love Your Database"

2016-05-04 Thread Alban Hertroys

On 4 May 2016 at 17:08, John McKown  wrote:
> I had a manager, long ago, who used a Lotus 1-2-3 spreadsheet to
> contain all his memos. I was glassy eyed in disbelief. He also would use his
> hand calculator to add up the numbers in the spreadsheet to be sure that the
> summation function in the spreadsheet didn't make a mistake. 

That still happens - we have a few live examples around in this
company. Unfortunately they're not caged and there's no sign "Do not
feed the managers". Admittedly, they're using Excel instead of
Lotus-1-2-3, but that's the only difference.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] (VERY) Slow Query - PostgreSQL 9.2

2016-05-03 Thread Alban Hertroys


> On 03 May 2016, at 11:55, drum.lu...@gmail.com wrote:
> 
> Hi all,
> 
> I'm trying to get the query below a better performance.. but just don't know 
> what else I can do...
> 
> Please, have a look and let me know if you can help somehow.. also.. if you 
> need some extra data jet ask me please.
> 
> * Note that the gorfs.inode_segments table is 1.7TB size
> 
> I have the following Query:
> 
> explain analyze
> 
> SELECT split_part(full_path, '/', 4)::INT AS account_id,
>split_part(full_path, '/', 6)::INT AS note_id,
>split_part(full_path, '/', 9)::TEXT AS variation,
>st_size,
>segment_index,
>reverse(split_part(reverse(full_path), '/', 1)) as file_name,
>i.st_ino,
>full_path,
>(i.st_size / 100::FLOAT)::NUMERIC(5,2) || 'MB' AS size_mb
> FROM gorfs.inodes i
> JOIN gorfs.inode_segments s
>   ON i.st_ino = s.st_ino_target
> WHERE i.checksum_md5 IS NOT NULL
>   AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
>   AND i.st_size > 0;

(Stripped the 1-and-a-half extra queries in there, but that incomplete one 
might be why you're waiting?)

>   • Explain analyze link: http://explain.depesz.com/s/Oc6
> The query is taking ages, and I can't get the problem solved.
> 
> These are the index I've already created on the inode_segments table:


> What else can I do to improve the Performance of the Query?

The first thing I notice in your query is that you're making use of 
hierarchically organised data without storing it hierarchically, namely that 
full_path field. The result of that is that both your table and your index 
contain a lot of redundant information.

Now I'm not so sure a hierarchical table + query are going to help get you much 
performance out of this (probably worth an experiment or two, mind that O/S's 
usually use inode trees for such things), but reducing the redundancy in the 
index would probably help:

create index gorfs.inodes_accounts_idx on gorfs.inodes (substring (full_path 
from 20)) where full_path like '/userfiles/account/%';

and then use similar expressions in your query of course:

where full_path like '/userfiles/account/%' and substring(full_path from 20) ~ 
'^[0-9]+/[a-z]+/[0-9]+';

Good luck!
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Slow join over three tables

2016-04-28 Thread Alban Hertroys

On 28 April 2016 at 08:36, Tim van der Linden  wrote:
> On Wed, 27 Apr 2016 13:48:06 +0200
> Alban Hertroys  wrote:
>
>> In this case, you're using the values in adverse to filter relevant rid's 
>> for the FK join, so you might be better off with the inverse of above index:
>>   create index on report_adverses (adverse, rid);
>>   create index on report_drugs (drug, rid);
>
> H, like I reported yesterday this achieved a huge performance boost.
>
> However, two (new) things I like to touch on while further experimenting with 
> this query:
>
> 1. Altering Drug IDs or Adverse names effects the executing time negatively.
>
> In this example altering the drug IDs I search for makes the query 6 times 
> slower again:
>
> SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug
> FROM reports r
> JOIN report_drugs d ON d.rid = r.id
> JOIN report_adverses a ON a.rid = r.id
> WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back 
> pain', 'back pain'])
> AND d.drug = ANY (ARRAY[9557, 17848, 17880, 18223]) ORDER BY r.created;
>
> Different drug ID ([9557, 17848, 17880, 18223]), but the rest is the same. 
> Query plan:
>
>  Sort  (cost=31409.71..31417.48 rows=3107 width=41) (actual 
> time=662.707..662.819 rows=4076 loops=1)
>Sort Key: r.created
>Sort Method: quicksort  Memory: 415kB
>->  Nested Loop  (cost=25693.17..31229.48 rows=3107 width=41) (actual 
> time=71.748..661.743 rows=4076 loops=1)
>  ->  Merge Join  (cost=25692.61..26795.53 rows=3107 width=29) (actual 
> time=70.841..97.094 rows=4076 loops=1)
>Merge Cond: (d.rid = a.rid)
>->  Sort  (cost=16332.08..16671.61 rows=135810 width=8) 
> (actual time=48.946..58.623 rows=135370 loops=1)
>  Sort Key: d.rid
>  Sort Method: quicksort  Memory: 12498kB
>  ->  Index Scan using report_drugs_drug_idx on 
> report_drugs d  (cost=0.44..4753.44 rows=135810 width=8) (actual 
> time=0.681..28.441 rows=135532 loops=1)
>Index Cond: (drug = ANY 
> ('{9557,17848,17880,18223}'::integer[]))

You're doing ~9 times as many index lookups. A slowdown of 6x of this
part of the query seems rather reasonable.

>  Planning time: 16.438 ms
>  Execution time: 663.583 ms
>
> A total of 660 ms instead of the previous 120 ms. The amount of rows returned 
> are 4076, instead of the 448 rows of the previous query. Could this be the 
> cause of the slow down? Or is it the fact that the integers in the ANY clause 
> are further apart from one another and thus make for longer index searches?

Technically probably both, but the index node distance (assuming a
btree index) should be insignificant.

The sorting of found candidate rows on rid probably contributes the
most. I forgot how quicksort scales to the number of items to sort,
probably something like 2log(n) or something like that.

That said, you didn't happen to perform these tests (assuming >1)
during a busier period of the database server? I ask because later on
you seem to be getting fast results again after some more index
changes and I don't see cause for the difference.

> The same happens with changing the adverses, some adverses come back in ~120 
> ms, others can take up to one second.

Likely the same issue

> 2. Adding extra columns ignores indexes
>
> This one was kind of expected. Adding extra columns from both "report_drugs" 
> and "report_adverses" slows the lot down again. Extra columns added:
>
> - a.recovery
> - a.severity
> - d.reason
> - d.effectiveness
> - d.duration

I have to admit that I'm a bit behind with the current state of the
art of PostgreSQL, but last time I checked, the database needed to
look at each row in the result-set for transaction visibility
information. I recall that there was (at some point) much discussion
whether that visibility information could be added to indexes and that
there were strong arguments against doing so. Hence, I doubt that
those new index-only scans skip that step.

Unless I'm wrong there, adding non-queried fields to the index is only
going to affect your performance adversely. Relevant fields for
indices are those used in joins, those regularly used in conditions
(where-clauses) and those that are regularly sorted over.

Other fields are available from the candidate rows of the result set.
Putting them in the index just results in storing redundant
information.

That's how it always has been and I'm pretty sure that's still the case.

> As I would expect, adding these columns would make the previous multi-column 
> indexes useless. And they indeed were not used anymor

Re: [GENERAL] Slow join over three tables

2016-04-27 Thread Alban Hertroys

> On 27 Apr 2016, at 4:09, David Rowley  wrote:
> 
> On 27 April 2016 at 11:27, Tim van der Linden  wrote:

>> The query:
>> 
>> SELECT r.id, r.age, r.gender, r.created, a.adverse, d.drug
>> FROM reports r
>> JOIN report_drugs d ON d.rid = r.id
>> JOIN report_adverses a ON a.rid = r.id
>> WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific back 
>> pain', 'back pain'])
>> AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363]) ORDER BY r.created;
>> 

> Likely the only way to speed this up would be to create indexes;
> 
> create index on reports (id, age, gender, created);

Since you're not using age and gender in this (particular) query until the rows 
are combined into a result set already, it doesn't make a whole lot of sense to 
add them to the index. Moreover, since your'e ordering by created, I'd at least 
put that directly after id:

create index on reports (id, created);

> create index on report_adverses (rid, adverse);

In this case, you're using the values in adverse to filter relevant rid's for 
the FK join, so you might be better off with the inverse of above index:
create index on report_adverses (adverse, rid);
create index on report_drugs (drug, rid);

Do these tables have a primary key and can that be used here?

I hope I'm not being superfluous here, but don't forget to vacuum analyze after 
creating those indexes. Eventually autovacuum will do that for you, but you 
could otherwise be running tests to verify the impact of adding those indexes 
before autovacuum gets around to it.

Finally, perhaps it's more efficient to weed out all unrelated drugs and 
adverses before relating them to reports, something like:

SELECT r.id, r.age, r.gender, r.created, x.adverse, x.drug
FROM reports r
JOIN (
SELECT d.rid, a.adverse, d.drug
FROM report_drugs d
JOIN report_adverses a ON a.rid = d.rid 
WHERE a.adverse = ANY (ARRAY['back pain - nonspecific', 'nonspecific 
back pain', 'back pain']) 
AND d.drug = ANY (ARRAY[359, 360, 361, 362, 363])
) x ON x.rid = r.id
ORDER BY r.created;

Looking at the cardinality of your tables that does seem a bit unlikely though. 
Still, worth a shot...

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why is the comparison between timestamp and date so much slower then between two dates

2016-04-14 Thread Alban Hertroys

> On 14 Apr 2016, at 15:12, Thomas Kellerer  wrote:
> 
> now() (and current_timestamp as well) are defined to return the
> same value throughout the entire transaction.
> 
> So the optimizer _should_ be smart enough to do the conversion only 
> once at the beginning of the statement and then use that converted 
> value during the execution of the statement without the need
> to re-evaluate it for each row. 

As I understand it, that's not how it works.

If the optimizer would down-convert the value of now() from a timestamp to a 
date, it would lose precision, possibly resulting in wrong results for corner 
cases in general.
For that reason, it chooses to do the opposite and up-converts the dates. But, 
because the dates are fields and not constants, it has to do so for every row.

If that's indeed what happens, then indeed, now() gets evaluated only once, but 
the slow-down is caused by having to do conversions (for two field values) for 
every row.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why is the comparison between timestamp and date so much slower then between two dates

2016-04-13 Thread Alban Hertroys

On 13 April 2016 at 15:45, Tom Lane  wrote:
> Thomas Kellerer  writes:
>> So my question is: why is comparing a timestamp to a date so much slower?
>
> The date has to be up-converted to a timestamptz (not timestamp).
> I think the expensive part of that is determining what timezone
> applies, in particular whether DST is active.  You could try it
> with "localtimestamp" (no parens) instead of "now()" to see how
> it performs with a non-tz timestamp.
>
> regards, tom lane

Or... you can manually down-convert the timestamptz now() to a date ;)
That's basically what you're doing when you use current_date instead of now().

The reason that the other way around is so much more expensive is that
the database needs to do that conversion twice for every row in the
table. When down-converting now(), the DB only needs to do that once
for all rows.

Regards,
Alban Hertroys.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Really unique session ID - PID + connection timestamp?

2016-04-10 Thread Alban Hertroys

o it can repeats after server
> restarts, but my coll. found a "session creation timestamp".
> This is a key which unique.
> With this we can check for died sessions and we can clean their records.
> 
> It might help to explain more what it is you are trying to achieve.
> 
> First I am not sure what you mean by 'persistent lock', especially as it 
> applies to Postgres?
> 
> Second, I assume by garbaging you mean garbage collection of something?
> If that is the case what exactly are you garbage collecting?
> I see 'clean records', what records would these be?
> In particular, on Postgres, where are you going to do this?
> 
> 
> We want create same mechanism.
> 
> If the above questions did not already cover this, what mechanism?
> 
> 
> I know there are adv. locks in PG, but I want to use session id.
> 
> This could be:
> |pg_backend_pid|()
> 
> May pid repeats.
> Where I can get timestamp or some other unique data with I can create a
> combined primary key?
> 
> Thanks for your help!
> 
> dd
> 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com
> 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_restore real file size

2016-02-26 Thread Alban Hertroys

> On 26 Feb 2016, at 5:30, John R Pierce  wrote:
> 
> On 2/25/2016 8:26 PM, drum.lu...@gmail.com wrote:
>> 
>> I'm doing the pg_restore now in a 1.5TB file:
>> 
>> # ls -la
>> postgres postgres 1575324616939 Feb 20 13:55 devdb_0.sql
>> 
>> But, the restore has gone over 1.6 TB
>> 
> 
> the dump file does not contain the indexes, just CREATE INDEX statements

Or page alignment padding, or the effects of fill factors, to name a few more 
reasons.

Besides, your dump-file is compressed. It's being restored using pg_restore (or 
can it restore using plain text sql scripts these days?) instead of psql.

Lastly, how does a database in unicode fare for text size (without toasted 
values) against a plain text dump file in, say, utf-8 - which isn't even the 
worst case scenario? That's the simplistic case; in reality some values will 
get toasted and the dump file is compressed.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why does query planner choose slower BitmapAnd ?

2016-02-22 Thread Alban Hertroys

> On 22 Feb 2016, at 16:58, Tom Lane  wrote:
> 
> (BTW, is that index really on just a boolean column?  It seems
> unlikely that "phoneable" would be a sufficiently selective
> condition to justify having an index on it.  I'd seriously consider
> dropping that index as another solution approach.)

On that train of thought, I would think that a person or company would only be 
phoneable if they have a phone number registered somewhere. That somewhere 
probably being in another table that's too far away from the current table to 
check it straight away - so this is an optimisation, right?

Where I see that going is as follows: A "contact" either has a phone number - 
in which case you'd probably rather get that phone number - or they don't, in 
which case a null value is often sufficient[1].
While a phone number certainly takes up more storage than a boolean, it 
wouldn't require an index (because it's available right there) nor the extra 
joins to look up the actual phone number. And if you'd still want to put an 
index on it, the null values won't be indexed, which takes a bit off the burden 
of the larger field size.

You _could_ also take a shortcut and use a variation of your current approach 
by storing null instead of false for phoneable, but then your index would 
contain nothing but true values which rather defeats the point of having an 
index.

Query-wise, I suspect that the number of "contacts" that have a phone number 
far outweighs the number that doesn't, in which case it's more efficient to 
query for those that don't have one (fewer index hits) and eliminate those from 
the results than the other way around. In my experience, both the NOT EXISTS 
and the LEFT JOIN + WHERE phoneable IS NULL tend to perform better.

A final variation on the above would be to have a conditional index on your PK 
for those "contacts" that are NOT phoneable. That's probably the shortest and 
quickest list to query. I'd still prefer that field to contain something a bit 
more meaningful though...

Well, enough of my rambling!

Ad 1. It is possible that you cater for the possibility that you don't know 
whether a "contact" has a phone number or not, in which case null would 
probably be the wrong choice for "no phone number" because then you wouldn't be 
able to distinguish between "no phone number" and "I don't know".

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimize Query

2016-02-14 Thread Alban Hertroys

> On 14 Feb 2016, at 20:40, drum.lu...@gmail.com wrote:
> 
> Hi Alban! Sorry.. that was my mistake

Okay, first advice on that query: Trim it down to something that people can 
wrap their minds around.

You have a silly amount of code repetition in there, much of which doesn't even 
seem to serve a purpose. This is some kind of generated query, I gather?
For example, you don't use most of the fields from your first subquery.

Another example is the nested subquery in your left join, which can be reduced 
to a single subquery with just the fields summed that you actually need (and 
the customer_id, obviously).

The same goes for most of the joins inside that left join, definitely the left 
joins - but that depends a bit on your table definitions and contents.
For example, the fields you're summing come from account (but you can use 
customer instead, since you only use the account_id, which equals client_id 
anyway) and bill_item. Some fields in your where-clause come from job, some 
others I can't tell where they're from.

Trim, trim, trim, until you're left with a more readable query that gives you 
the same results and then put it through explain analyze again. It wouldn't 
surprise me if that query is already significantly faster.

If you're still having problems at that point, post that query and the analysis 
again.

> Explain analyze link:
> http://explain.depesz.com/s/5WJy

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimize Query

2016-02-14 Thread Alban Hertroys

> On 13 Feb 2016, at 11:21, drum.lu...@gmail.com wrote:
> 
> Anyone can help with that please?
> 
> Cheers

What would help is:
1. to post an actual query that you need optimised and
2. an explain analyze of that query.

What you posted in your original message was some kind of query-template with 
enough placeholders and views that there is no way to predict how that's going 
to perform without at least knowing what goes into the placeholders and how the 
views are built up.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Test CMake build

2016-02-10 Thread Alban Hertroys


> On 10 Feb 2016, at 19:09, Yury Zhuravlev  wrote:
> 
> Hello all.
> Please test build Postgres using cmake. If you are of course interested.
> Still not everything is ready but most of the work. Assembly instructions as 
> does the repository is on github:
> https://github.com/stalkerg/postgres_cmake
> 
> The compilation will be enough (tests even better). I need feedbacks so that 
> create issues on github.
> Very interesting NetBSD, OpenBSD, Solaris. 

I was under the impression that the FreeBSD port already uses cmake?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] execute same query only one time?

2016-02-08 Thread Alban Hertroys


> On 08 Feb 2016, at 20:05, Johannes  wrote:
> 
> select id, col1, col2, ... from t0 where id = (select max(id) from t0
> where col1 = value1 and col2 = value2 and …);


> select col1 from t1 where t0_id = (select max(id) from t0 where col1 =
> value1 and col2 = value2 and …);

select t0.id, t0.col1, t0.col2, t0…., t1.col1
from t0
join t1 on (t1.t0_id = t0.id)
group by t0.id, t0.col1, t0.col2, t0…., t1.col1
having t0.id = max(t0.id);

Low complexity and works with any number of rows from t0 (as does Adrian's 
solution, btw).
I'm not sure what you mean by "copying of columns" in your reply to Adrian's 
solution, but I don't think that happens here.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Let's Do the CoC Right

2016-01-24 Thread Alban Hertroys

> On 24 Jan 2016, at 1:48, Regina Obe  wrote:
> So the point is stop assuming who has experience and who doesn't simply by
> how people look.

+1

To expand on that: Don't let your prejudices get the better of you. Assuming 
that other people are prejudiced about you is just another prejudice.

The people you're communicating with on a mailing list are individuals, not 
groups, so do not generalise. In most cases you do not know enough about them 
to make such assumptions, and that's assuming they are valid, which they 
generally are not (yikes! a generalisation!).

> The irony is that PostgreSQL is so diverse that a Coc leads to nothing but
> huge arguments about what's okay in one culture and not another. We can only
> go by intent and not what was done.  If we can't say that, then a Coc does
> us no good.  I've already violated all Cocs at least 3 times in this whole
> conversation.
> 
> Thanks,
> Regina

What about this for a CoC?:
1. Do not discuss a CoC.

But, this side-thread has been going on long enough I think. Let's wrap this 
up, shall we?

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Code of Conduct: Is it time?

2016-01-12 Thread Alban Hertroys

On 12 January 2016 at 09:25, Chris Travers  wrote:

> One of the dangers of a CoC is that there are many potential issues which
> may or may not become real problems.  I think if we try to be clear on all
> of them, then we risk creating codes instead of a general expectation of
> what we do expect.

Another consideration.

Last night I was thinking this issue over and then remembered that
normally very reasonable persons (which I count myself among) can
react quite poisonous when they are tired or stressed and people start
pushing their buttons. Those people probably would not be violating
any CoC rules, but can cause someone else to do so.

Moreover, some people are exceptionally good at pushing all the wrong
buttons, whether doing that willingly (out of malice) or not.
I'm a bit concerned that a CoC could give the malicious among those
the ammunition they need to push buttons of their victims. Now of
course, they could do that just as well without a CoC and I don't
recall any instances of this problem on this list.

To add to that, non-native speakers sometimes make mistakes that set
it off. I remember an embarrassing case where I thought the word
"gross" came from the German "Grosshaft", which means quite the
opposite (great, fabulous), and responded to a new idea on a list with
a heartily meant "Gross!". And then you suddenly get angry mails from
all over the place without understanding how that happened. Oops.

Where I stand? I do not know whether a CoC for PG is a good idea or
not, I can't decide. Anyway, in my case it's nothing more than an
opinion anyway - my contributions are pretty much limited to offering
help on this ML.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] WIP: CoC V2

2016-01-11 Thread Alban Hertroys

> On 12 Jan 2016, at 0:16, Joshua D. Drake  wrote:
> 
> 3. A safe, respectful, productive and collaborative environment is free of 
> non-technical or personal comments, for example ones related to gender, 
> sexual orientation, disability, physical appearance, body size, race or 
> personal attacks.

I'm not debating whether there should be examples or not, they are usually 
useful, but perhaps examples belong in a separate section and not in the core 
CoC?

Frankly though, this thread looks like a testament of why Postgres doesn't 
really need a CoC. You people are all being so polite about it that it's almost 
offensive!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Definitive answer: can functions use indexes?

2016-01-07 Thread Alban Hertroys

> On 07 Jan 2016, at 5:19, Jim Nasby  wrote:
> 
> On 1/6/16 5:41 PM, Tom Lane wrote:
>> Since the question makes little sense as stated, I'm going to assume
>> you mean "can a query like SELECT ... WHERE left(foo, 3) = 'bar'
>> use an index on column foo?"
>> 
>> The answer to that is no, there is no such optimization built into
>> Postgres.  (In principle there could be, but I've not heard enough
>> requests to make me think we'd ever pursue it.)
> 
> BTW, the case where this would be highly valuable is timestamps. Being able 
> to do something like date_part('month',timestamptz)='Jan' would be a big, big 
> deal for warehousing.

Not just warehousing, for BI in general.

But, as is now quite clear, for many of those cases it should be fairly trivial 
to work around this limitation by creating either a functional index or an 
operator. For the above example, say something like timestamptz % 'month' = 
'Jan'.

There are downsides to that approach though, such as readability and that this 
way of using % instead of date_part() is not according to any standard 
behaviour and could even behave differently or (more likely) not work at all on 
other PG instances.

That said, it's not uncommon in BI to require a seq. scan anyway, in which case 
the point is rather moot.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] A unique pairs version of UNNEST() ?

2016-01-05 Thread Alban Hertroys


> On 04 Jan 2016, at 21:08, Wells Oliver  wrote:
> 
> Hey all, happy new year.
> 
> I am trying to get unique pairs from an array of N numbered items, usually 5, 
> but possibly 4 or 6.
> 
> If I just wanted unique values, I could do SELECT UNNEST(list_of_ids) AS id, 
> COUNT(*) FROM table GROUP BY id but in this situation I want all unique pairs 
> and a COUNT.
> 
> For those familiar with python, this is the functionality found in 
> itertools.combinations. I'm leaning towards just doing this in python, but I 
> really like keeping as much in SQL as possible.
> 
> So in an example where list_of_ids is {1,2,3,4,5} I would essentially get:
> 
> {1, 2}
> {1, 3}
> {1, 4}
> {1, 5}
> {2, 3}
> {2, 4}
> {2, 5}
> {3, 4}
> {3, 5}
> {4, 5}

Your example reminds me of combinatory theory. The result you seem to be 
looking for is the list of possible unique combinations, as sets of elements of 
the total set (sets are orderless).

with list_of_ids as (
  select unnest(list_of_ids) as id from table
)
select a.id, b.id
  from list_of_ids a, list_of_ids b
 where b.id > a.id;

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] to_timestamp alternatives

2016-01-01 Thread Alban Hertroys

> On 01 Jan 2016, at 0:46, Jim Nasby  wrote:
> 
> BTW, my recommendation would be to store in a timestamptz field *with the 
> correct timezone*, and then convert on output as necessary. This is easy to 
> do by either
> 
> SET timezone
> 
> or
> 
> SELECT timestamptz_field AT TIME ZONE '…';

This. When converting the original timestamps to UTC, you lose data. In my 
experience, you will end up needing that data sooner or later.
Remember, the database stores timestamps in UTC internally anyway, you don't 
need to the conversion yourself.

> Since you're dealing with GPS data and presumably have lat/long, it shouldn't 
> be hard to do this dynamically either, either by just blindly dividing 
> longitude by 15 or using actual timezone shape polygons and @> or <@.

That would be a bad idea for global data, since not all time zones are full 
hours apart, or have the same (if any) DST change-over dates. For example, 
India is currently at UTC+05:30, probably because they wrapped the entire 
country in the same TZ after their independence.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Recurring and non recurring events.

2015-12-27 Thread Alban Hertroys

> On 26 Dec 2015, at 13:03, Kevin Waterson  wrote:
> 
> Thanks, as I am new to postgres, I was unaware of this function.

Actually, the article you referenced makes use of generate_series as well (at 
INSERT INTO events), but then for some reason decides to create a 
generate_recurrences function later on. Possibly the choice came from them 
using a domain (RECURRENCE) that did not translate directly (although almost) 
to an interval.

> To go with this, I guess I will need a table with which to store intervals, 
> start and end dates?
> 
> eg
> CREATE table events(
> id serial primary key,
> start_timestamp timestamp,
> end_timestamp timestamp,
> interval 
> 
> with dateRange as
>   (
>   SELECT min(start_timestamp) as first_date, max(start_timestamp) as last_date
>   FROM events
>   )
> select 
> generate_series(first_date, last_date, '1 hour'::interval)::timestamp as 
> date_hour
> from dateRange;

But, instead of generate_series you could also use a recursive CTE (which is 
more or less standard SQL - implementations differ slightly between databases):

with recursive dateRange (curr_stamp, max_stamp, step) as (
select min(start_timestamp), max(start_timestamp), interval '1 week'
  from events
union all
select curr_stamp + step, max_stamp, step
  from dateRange
 where curr_stamp + step <= max_stamp
)
select curr_stamp from dateRange;

I suspect generate_series is faster, but since your query already almost looked 
like this I thought I'd offer this alternative approach. It has a little bit 
more flexibility too, as you can add fields and calculations to the CTE quite 
easily.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Transfer db from one port to another

2015-12-23 Thread Alban Hertroys


> On 23 Dec 2015, at 12:43, Killian Driscoll  wrote:
> 
> OK - I did the dir and it shows that the dump and restore.exe are there, but 
> running the above gives the below errors
> 
> 09/06/2014  08:35   381,952 pg_dump.exe
> 
> 09/06/2014  08:35   180,224 pg_restore.exe
> 
> C:\Users\killian>path "C:\Program Files\PostgreSQL\9.3\bin";%path%
> 
> C:\Users\killian>pg_dump -Fc -p 5432 -U postgres irll_project | pg_restore -U 
> po
> stgres -p 5532
> ccoulould not findd a n "pg_restore" to executeot find a "pg_dump" to execute

Aren't you trying to move a database to PG 9.4? Then you need to use the 
pg_dump and pg_restore utilities of the 9.4 installation, not those of the 9.3 
one. Those utilities are guaranteed to be backwards compatible, but they're not 
necessarily forwards compatible.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-12-01 Thread Alban Hertroys

> Before I start in to implement a DELETE / AUTOVACUUM / VACUUM approach, to
> recycling disk space used for a session management table, I would like to
> propose, for consideration by this forum, an idea for a different approach.
>
> A row in a session management table, represents a significant "sunk cost" in
> both computing time used to create the row, and in disk space allocated.
> Postgres has to use a lot of resources to create that row in the first
> place.
>
> When the session that originally caused that row to be allocated, eventually
> expires -- why delete the associated session managent row ??
>
> Instead of using a DELETE command to destroy the row (and a
> resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage
> space), why not instead, simply mark that session management row as "free"
> (with an UPDATE command) ??

An UPDATE is a combination of an INSERT and a DELETE command.

However, rows marked as deleted will be reused at some point after
autovacuum (or manual VACUUM) has made sure they are no longer in use
by any DB session.

So your approach can still work, as long as you vacuum that table
frequently enough. The actual solution isn't the UPDATE instead of the
DELETE though, but rather the partial index and an increased
autovacuum frequency.

Alban.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ftell mismatch with expected position

2015-11-02 Thread Alban Hertroys

Please refrain from top-posting.

On 2 November 2015 at 10:48, Eelke Klein  wrote:
> Normally we call this from within our windows program where a lot of code is
> involved for setting up the environment, and creating the pipes and
> redirecting stdout, stderr and stdin. However I believe it is the same
> problem that can be caused with the following commandline in cmd.exe but you
> will need to have Coreutils for windows installed to get the cat command.
>
> pg_dump.exe -U postgres --format=c mydb | cat > backupfile

Do you also get the warning if you split those commands up, and if so,
in which one?

pg_dump.exe -U postgres --format=c mydb > backupfile
cat backupfile > someotherfile

> The size of the backup doesn't matter. As long as there is atleast one table
> in the database the warning occurs. When I use --verbose it looks like I get
> one warning for every table in the database while it is dumping the data in
> the tables.
>
> Eelke Klein
>
>
> 2015-10-30 14:53 GMT+01:00 Adrian Klaver :
>>
>> On 10/29/2015 02:51 AM, Eelke Klein wrote:
>>>
>>> I am getting the following warning when our program runs pg_dump.exe and
>>> the output is in custom format and send to standard out which is
>>> connected to a pipe (Windows platform).
>>>
>>> pg_dump: [custom archiver] WARNING: ftell mismatch with expected
>>> position -- ftell used
>>
>>
>> Can you show us the exact command, with arguments, that you are using?
>>
>> Also what is the size of the dump file you are creating?
>>
>>
>>>
>>> The output of pg_dump is received by another process that copies it (and
>>> some other data) into a zip file. When I extract the db backup from the
>>> zip the backup is fine and I can restore it.
>>>
>>> I have both tried the pg_dump.exe from the 9.5 alpha and a stable 9.4
>>> both generate this warning.
>>> When I change the format to plain the warning goes away.
>>> When I call pg_dump from the commandline and let it send to standard out
>>> and redirect it to a file the warning does not occur but it does occur
>>> when I redirect to nul.
>>>
>>> Is there a way to prevent this warning? I would think pg_dump shouldn't
>>> call ftell on a pipe.
>>>
>>> Regards, Eelke
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>
>



-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Recursive Arrays 101

2015-10-25 Thread Alban Hertroys

> On 25 Oct 2015, at 19:38, Adrian Klaver  wrote:
> 
> On 10/25/2015 11:12 AM, David Blomstrom wrote:
>> I'm sorry, I don't know exactly what you mean by "definitions." The
>> fields Taxon and Parent are both varchar, with a 50-character limit.
>> ParentID is int(1).
> 
> By definition I meant the schema, so from the below:
> 
> CREATE TABLE t (
> N INT(6) default None auto_increment,
> Taxon varchar(50) default NULL,
> Parent varchar(25) default NULL,
> NameCommon varchar(50) default NULL,
> Rank smallint(2) default 0
> PRIMARY KEY (N)
> ) ENGINE=MyISAM

That can indeed be solved using a hierarchical query (provided you have a 
suitable table in PG); something akin to:

WITH RECURSIVE taxons AS (
--  Hierarchical root nodes
SELECT N AS id, Taxon, Rank, 1 AS level, '' || N AS Path -- A useful 
addition explained further down
FROM t
WHERE ParentID IS NULL

-- Child nodes
UNION ALL
SELECT  N AS id, Taxon, Rank, taxons.level +1 AS level, taxons.Path || 
':' || N AS Path
FROM taxons
JOIN t ON taxons.id = t.ParentID
)
SELECT id, Taxon, Rank, level
FROM taxons
ORDER BY Path
;

The Path-bit looks complicated, but basically that just appends ID's within the 
same hierarchy such that, when sorted on that field, you get the hierarchy in 
their hierarchical order. What the hierarchy would look like if it were shown 
as a file hierarchy with sub-directories expanded, for example. That's pretty 
much the only viable alternative (alternatives vary on the column used to 
create the hierarchy), which is why I added it to the example.

The fun thing with hierarchical queries is that you can add all kinds of extra 
information and make it trickle down to the child nodes, such as the items that 
make up the root of the hierarchy (pretty useful for grouping), for example or 
a field that calculates a string to prepend for indentation, etc. Or a 
computation that depends on values in parent items (I used this successfully in 
a bill of materials to calculate absolute quantities by volume, quantities by 
weight and cost of components in the end product where they were given relative 
to 1 kg of their parent, for example).

It's highly flexible and powerful (and standard SQL), but it takes a bit of 
time to get in the right mindset.

PS. I usually write my hierarchical queries in Oracle, which isn't quite as 
good at them as Postgres is, but it's what we have @work. Hence, I'm not sure I 
got the syntax 100% correct. We're working on getting PG in for a project 
upgrade (replacing RDB on OpenVMS, which will go EOL in <10 years!) - fingers 
crossed.

Cheers!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] to pg

2015-09-25 Thread Alban Hertroys

On 25 September 2015 at 13:08, Ramesh T  wrote:
> CREATE UNIQUE INDEX idx_load_pick ON  pick (case picked when picked='y' then
> load_id else null end );
>
> how can i convert case expressed to postgres..above it is oracle.

Assuming that your queries are written in such a way that Oracle is
indeed using that index and you want your queries to use the index as
well in PG:

CREATE UNIQUE INDEX idx_load_pick ON (load_id) WHERE CASE picked WHEN
'y' THEN load_id ELSE NULL END IS NOT NULL;

That's definitely written a bit redundantly, that's Oracle's fault.

If your queries aren't like that, it's as Ladislav wrote. Much simpler in PG!

To make Oracle use your original index, your queries are probably of a
form containing snippets like:

SELECT *
FROM foo
WHERE CASE picked WHEN 'y' THEN load_id ELSE NULL END IS NOT NULL

BTW, your CASE statement isn't exactly valid, even in Oracle. Your
comparison is in fact this: picked = picked='y'.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Prepared Statements and Pooling

2015-08-11 Thread Alban Hertroys

On 11 August 2015 at 06:44, Mister Junk  wrote:
> I'm using prepared statements to prevent SQL injection, but I have done some
> reading and it seems like using Prepared statements COULD improve
> performance.  I understand the concept, but I do not know how to implement
> this.

They could, they could also reduce performance.

Prepared statements store and reuse an earlier generated query plan
for a query, so that's the time you safe.

However, because the query planner can't make use of the actual query
parameter values (they are yet unknown at that point), it can only
generate a general query plan.

If your data is equally distributed for your query parameters, that
won't matter much, but if some values are far more common than others,
that can actually hurt performance.

In case of a normal statement, the query planner has enough
information to decide which plan would be more efficient. With a
prepared statement it has not.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question about copy from with timestamp format

2015-07-30 Thread Alban Hertroys


> On 30 Jul 2015, at 17:59, Sherrylyn Branchaw  wrote:
> 
> From here:
> 
> http://www.postgresql.org/docs/9.4/interactive/sql-copy.html
> 
> "COPY can only be used with plain tables, not with views. However, you can 
> write COPY (SELECT * FROM viewname) TO "
> 
> Right, so you can COPY FROM a view, but not, as far as I can tell, TO a view, 
> unless Alban found a workaround.

Well, that's why I said it _might_ work.

COPY is a bit special, I'm afraid. For starters, although it works _like_ doing 
a bunch of INSERTs, it doesn't perform actual INSERTs. Apparently, that also 
means it won't fire an INSERT rule and thus can't be used with an updatable 
view. There are no rules on such a view (rules rewrite the query) that would 
work for COPY.

Now perhaps that sounds like a COPY rule is warranted for cases like these, but 
that doesn't help, exactly because the COPY command has no place in its syntax 
for expressions (such as this type conversion). INSERT does, hence we can write 
a rule for it…

In hindsight it all makes sense. That doesn't bring you any closer to a 
solution, unfortunately.

> On Thu, Jul 30, 2015 at 11:48 AM, Adrian Klaver  
> wrote:
> On 07/30/2015 08:44 AM, Sherrylyn Branchaw wrote:
> I was thinking that perhaps an updatable view might do the trick?
> 
> Interesting idea! Are you able to get it to work? I keep getting 'ERROR:
>   cannot copy to view "view_ts_test"' even before my trigger fires.
> Inserting, though, works fine.
> 
> From here:
> 
> http://www.postgresql.org/docs/9.4/interactive/sql-copy.html
> 
> "COPY can only be used with plain tables, not with views. However, you can 
> write COPY (SELECT * FROM viewname) TO "
> 
> 
> Still curious why the triggers I'm writing won't fire before my
> statement errors out on copying to a view, or inserting an out-of-range
> timestamp, when the trigger would resolve all the illegal operations if
> it just fired first.
> 
> 
> On Thu, Jul 30, 2015 at 5:57 AM, Alban Hertroys  <mailto:haram...@gmail.com>> wrote:
> 
> 
> > On 30 Jul 2015, at 2:27, Sherrylyn Branchaw  <mailto:sbranc...@gmail.com>> wrote:
> >
> > Based on your PS asking about data types and commenting that you don't 
> want to put hour in a separate column, it sounds like this is a brand-new 
> table you're creating. If so, and if this is a one-time COPY operation, you 
> can create a text column for the initial import. Then after you're done 
> importing, you can execute
> >
> > ALTER TABLE ts_test ALTER COLUMN ts_fld TYPE TIMESTAMP USING 
> (to_timestamp(ts_fld, 'MMDDHH24'));
> >
> > to convert the format of the imported data to a timestamp. Then you're 
> set.
> >
> > If there will be ongoing imports of more files like this, though, 
> you'll need the intermediate table solution offered by Adrian.
> 
> Or keep both columns and update those where the text-column is NOT
> NULL and the timestamp column is NULL.
> 
> > I was going to suggest a trigger, but it turns out that the data type 
> checking happens even before the BEFORE trigger fires, so you don't get a 
> chance to massage your data before actually inserting it. I got 'ERROR:  
> date/time field value out of range: "2015072913 "' before the 
> trigger even fired. I
> wonder if that's deliberate? I was able to implement a workaround by
> adding a raw_ts_fld column of type text, but an extra column might
> be too ugly for you relative to a temp table, I don't know.
> 
> I was thinking that perhaps an updatable view might do the trick?
> 
> You would need to create a view with the timestamp column converted
> to text in the format in your CSV file. Next you add an INSERT rule
> that does the conversion from text to timestamp and inserts the row
> in the actual table. Finally, you use the view in the COPY statement
> instead of the table.
> Added bonus, you can now also use the view to export your table to
> the same CSV format.
> 
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
> 
> 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com
> 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question about copy from with timestamp format

2015-07-30 Thread Alban Hertroys


> On 30 Jul 2015, at 2:27, Sherrylyn Branchaw  wrote:
> 
> Based on your PS asking about data types and commenting that you don't want 
> to put hour in a separate column, it sounds like this is a brand-new table 
> you're creating. If so, and if this is a one-time COPY operation, you can 
> create a text column for the initial import. Then after you're done 
> importing, you can execute 
> 
> ALTER TABLE ts_test ALTER COLUMN ts_fld TYPE TIMESTAMP USING 
> (to_timestamp(ts_fld, 'MMDDHH24'));
> 
> to convert the format of the imported data to a timestamp. Then you're set.
> 
> If there will be ongoing imports of more files like this, though, you'll need 
> the intermediate table solution offered by Adrian.

Or keep both columns and update those where the text-column is NOT NULL and the 
timestamp column is NULL.

> I was going to suggest a trigger, but it turns out that the data type 
> checking happens even before the BEFORE trigger fires, so you don't get a 
> chance to massage your data before actually inserting it. I got 'ERROR:  
> date/time field value out of range: "2015072913"' before the trigger even 
> fired. I wonder if that's deliberate? I was able to implement a workaround by 
> adding a raw_ts_fld column of type text, but an extra column might be too 
> ugly for you relative to a temp table, I don't know.

I was thinking that perhaps an updatable view might do the trick?

You would need to create a view with the timestamp column converted to text in 
the format in your CSV file. Next you add an INSERT rule that does the 
conversion from text to timestamp and inserts the row in the actual table. 
Finally, you use the view in the COPY statement instead of the table.
Added bonus, you can now also use the view to export your table to the same CSV 
format.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] database-level lockdown

2015-07-07 Thread Alban Hertroys

On 7 July 2015 at 12:55, Filipe Pina  wrote:
> On Ter, Jul 7, 2015 at 1:41 , Adrian Klaver 
> wrote:
>> Still not sure what is you are trying to accomplish. Is it really necessary
>> that every transaction be serialized? Or to put it another way, why are you
>> running in serializable by default? Or yet another way, what is the problem
>> you are trying to solve with serialized transactions?
>
> Exactly, that's the twist I've decided after some tests yesterday (the "lock
> all tables on last try" degraded performance in an obscene way): giving up
> on serializable by default.
>
> I wanted to use serializable so developers wouldn't have to worry about
> properly using locks, but implementing this "serialization failure" handler
> is becoming a major headache...

What Adrian was trying to get out of you is why you think you need those locks.

You're working with an RDBMS, it does the job of keeping data
integrity in a multi-user environment already. You can trust it do do
that well.
So unless you're doing something really special, you shouldn't need to
lock anything.
Among the special cases where you do need locks are things like
needing a gapless sequence (invoice numbers come to mind); things like
that.

So once again, what do you need those locks for?

P.S. Please don't top-post on this list. I had to edit your message to
fix the flow of the conversation.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] use null or 0 in foreign key column, to mean "no value"?

2015-06-27 Thread Alban Hertroys

> On 27 Jun 2015, at 5:59, Robert Nikander  wrote:
> In application code, prepared statements want to say: `select * from items 
> where color_id = ?` and that `?` might be a int or null, so that doesn’t 
> work. 

You could add another parameter to test which expression to 'activate' in the 
query, something like:

select * from items where ('notnull' = ? and color_id = ?) or ('null' = ? and 
color_id is null);

Of course, with those _positional_ query parameters that means you need to add 
the same value TWICE into the query. You wouldn't need to with _named_ query 
parameters, if those are available to you.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Python 3.2 XP64 and Numpy...

2015-05-28 Thread Alban Hertroys


> On 28 May 2015, at 17:54, Rémi Cura  wrote:
> 
> I tried:
> 
> C:\Python32>python.exe
> 
> Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on 
> win32

Ehm, this seems significant? 
---^

It looks like you're trying to use 64-bit binaries on a 32-bit OS.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] date with month and year

2015-05-22 Thread Alban Hertroys

On 21 May 2015 at 23:42, Karsten Hilbert  wrote:

> You are right in the following aspect:
>
> - client sends in "NOW at HERE"
> - server knows HERE = UTC+2

And then the tectonic plate you're on shifts and you're suddenly in UTC+1 or +3

Thankfully, those things don't shift as fast as they sometimes do in the movies.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Allowing postgresql to accept 0xff syntax for data types that it makes sense for?

2015-05-22 Thread Alban Hertroys

On 22 May 2015 at 04:46, Bill Moran  wrote:
> I did a litle research and it appears that neither Oracle nor db2 supports
> the 0xff syntax ... so not _quite_ as common as it seemed to me.

> With all that being said, if I were to build a patch, would it be likely
> to be accepted into core?

Wouldn't you also need to support similar syntax for octal numbers for
the patch to be complete? Or are those already supported (ISTR that's
'077' for decimal 63)?

Not that I care at all about octal numbers, but supporting one and not
the other just doesn't seem right.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] plpgsql functions organisation

2015-05-03 Thread Alban Hertroys

o with C aren't justified, because
> plpgsql has none of that complexity. There are advantages both ways.
> 
> The beauty of PostgreSQL is that you have both available and you
> can choose whichever is best for your situation.
> 
> Agreed, though in my case I drop into plpythonu when I want more complex 
> solutions.
> 
> 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com
> 
> 
> 
> -- 
> Melvin Davidson
> I reserve the right to fantasize.  Whether or not you 
> wish to share my fantasy is entirely up to you. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] schema or database

2015-04-13 Thread Alban Hertroys


> On 13 Apr 2015, at 4:20, Ian Barwick  wrote:
> 
> On 13/04/15 11:08, Michael Cheung wrote:
>> hi, all;
>> 
>> I am new here. And I need some suggestion.
>> 
>> I have many similar database to store data for every customer.
>> Structure of database is almost the same.
>> As I use same application to control all these data, so I can only use
>> one database user to connect to these database.
>> And I have no needs to query table for different customer together.
>> 
>> I wonder which I should use, different shema or different database to store 
>> data?
>> 
>> I 'd like to know the advantage and disadvantage for using schema or 
>> database.
> 
> If as you say access to the database is via a single application database
> user, it will probably make more sense to use multiple schemas rather than
> multiple databases. Keeping everything in one database will simplify
> administration (e.g. making backups - ypu'll just need to dump the one 
> database
> rather than looping through a variable number) and will make life easier if 
> you
> ever need to do some kind of query involving multiple customers.

That's easier to backup, sure, but you can't restore a single customer's schema 
easily that way. So if one customer messes up their data big time, you'll need 
to restore a backup for all customers in the DB.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why doesn't `RAISE EXCEPTION` provide error context?

2015-04-03 Thread Alban Hertroys

On 2 April 2015 at 19:15, Taylor Brown  wrote:

> So, I would rather put a check like this at the top of my function:
>
> --
> important_variable = (p_request::json->>'important_variable')::integer;
> IF (important_variable is NULL) THEN RAISE EXCEPTION 'important_variable
> must not be NULL.'; END IF;
> --
>
> But I won't be able to get the context for that exception, and all I'll be
> able to return from the function or write to the logs is
> 'important_variable must not be NULL.'. If that's the only place I throw
> that error, I'll know where to look. Otherwise, I have no context, and
> won't be able to determine where my exception was thrown. So I'm actually
> better off _not_ throwing my own custom exceptions, even though I would
> prefer to be more defensive about this sort of thing in my code.
>
> I thought I might be able to "trick" postgres into throwing another
> unrelated exception that would not only include my custom error message,
> but allow me to extract the context, telling me the function where I should
> begin debugging.
>

Just a quick idea, but... Perhaps you could put a constraint on those JSON
messages indirectly, by mapping it to some kind of template table?
The top-level of a JSON message can usually be mapped to a table, and a
table can have constraints and, for example, a BEFORE INSERT trigger that
always returns NULL so that the data is not actually inserted into the
table.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: [GENERAL] pgadmin3 installation on Oracle Linux 6.6 64-bit ?

2015-03-30 Thread Alban Hertroys

On 28 March 2015 at 02:14, Yuri Budilov  wrote:

> I am new to PostgreSQL and Linux (coming across from Microsoft SQL Server).
>
> I installed PostgreSQL 9.4 on Oracle Linux 6.6 and its working ok (psql,
> etc).
> Now I try to install pgadmin3 on the same OS.
>
> I am having problems installing pdadmin3 rpms on Oracle Linux 6.6 64-bit.

I'm often somewhat surprised that people expect to have GUI tools on
database servers. Why would you really? You can connect from anywhere
using your OS of choice (or rather, the choice of your employer in
most cases) - it's perfectly fine to run pgadmin3 on Windows to
connect to a PostgreSQL database on Oracle Linux.

And why waste potential DB cycles on a graphical environment? You don't need it.
For most things it's plenty (and frequently a lot more powerful and
reliable) to have a textual user interface through a remote terminal
session over SSH or equivalent. Of course, there is a learning curve
to be able to use the UNIX (or Linux) shell and its tools - start with
'man man' if you're new ;)

That said, of course I agree that it should be possible to install
pgadmin3 on Oracle Linux if you'd want to.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Populating missing dates in postgresql data

2015-03-26 Thread Alban Hertroys

5  2014-12-01 
>  Yes
> 
> 286 5  2014-12-02 
>  No
> 
> 4507   1  2014-12-01  
> No
> 
> 4507   1  2014-12-02  
> No
> 
> 4507   2  2014-12-01  
> No
> 
> 4507   2  2014-12-02  
> No
> 
>  
> 
>  
> 
> I have been racking my mind for the better part of a day on how to do this. 
> The thing is that I can do a right join of TABLE B on TABLE A, and this will 
> populate the missing object ids and channels. However, this only works for a 
> single
> 
>  
> 
> object_id channelcheck-in date
>  comm failure
> 
> **
> 
> 990 1  2014-12-01 
>  No
> 
> 990 1  2014-12-02 
>  No
> 
> 990 2  2014-12-01 
>  No
> 
> 990 2  2014-12-02 
>  No
> 
> 286 2 
>  Yes
> 
> 286 2  2014-12-02 
>  No
> 
> 286 5 
>  Yes
> 
> 286 5  2014-12-02 
>  No
> 
> 4507   1  2014-12-01  
> No
> 
> 4507   1  2014-12-02  
> No
> 
> 4507   2  2014-12-01  
> No
> 
> 4507   2  2014-12-02  
> No
> 
>  
> 
> I need to do a count of comm failures by day, so I need to populate the 
> check-in date field. Please help!


Easiest would be to insert the missing values in your table, something like:

WITH RECURSIVE calendar (missing_date) AS (
SELECT MAX(check_in_date) FROM table_a WHERE comm_failure = 'YES'
UNION ALL
SELECT missing_date + interval '1 day' FROM calendar
WHERE missing_date < CURRENT_DATE
)
INSERT INTO table_a (object_id, channel, check_in_date, comm_failure)
SELECT b.object_id, b.channel. c.missing_date, 'YES'
FROM table_b b, calendar c
WHERE NOT EXISTS (
SELECT 1
FROM table_a a
WHERE a.object_id = b.object_id
AND a.channel = b.channel
AND a.check_in_date = c.missing_date
);

That's off the top of my head, untested, etc, but I think I got that mostly 
right.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Installation Size showing huge size in postgres installed on windows os

2015-03-19 Thread Alban Hertroys

On 19 March 2015 at 13:44, Raymond O'Donnell  wrote:
> On 19/03/2015 12:39, jaime soler wrote:
>> El mié, 18-03-2015 a las 23:05 -0700, Rajagopal NS escribió:
>>> I have installed Postgres 9.0 in my machine. When I look at Programs and
>>> Features under Control Panel,
>>> I see the Size for Postgres 9.0 is shown as 121GB.
>>>
>>> I feel neither the installation or the small postgres databases I would have
>>> created use 121GB.

For what it's worth, I've seen the wildest claims about installation
sizes in that panel. Mostly the installation size is severely
underestimated, but overestimates happen too - haven't seen anything
this bad though. I stopped trusting those numbers quite a while ago.

The reliable method is to check the directory size of your catalog
directories and the Postgres installation directory yourself.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Reg: PL/pgSQL commit and rollback

2015-03-17 Thread Alban Hertroys

On 17 March 2015 at 15:30, Medhavi Mahansaria
 wrote:
> Yes. I have read this document.
>
> But my issue is that even when it throws and exception I need to rollback
> the changes made by that query and move on to the next block.
>
> Is there any way to accomplish that?

Please do not toppost on this list.

The solution is to do what Albe explained: use BEGIN ... EXCEPTION ...


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Update using non-existent fields does not throw an error

2015-03-16 Thread Alban Hertroys

On 16 March 2015 at 17:02, Rob Richardson  wrote:
> Greetings!
>
>
>
> An update query is apparently succeeding, even though the query refers to
> fields that do not exist.  Here’s the query:
>
>
>
> update inventory set
>
> x_coordinate = (select x_coordinate from bases where base = '101'),
>
> y_coordinate = (select y_coordinate from bases where base = '101')
>
> where charge = 100
>
>
>
> -- select x_coordinate, y_coordinate from bases where base = '101'
>
>
>
> When I run the update query, it tells me that the query succeeded and that
> four records were updated, which is what I expect.  But when I looked at the
> inventory table, I found that the four records were unchanged.  So, I tried
> to check the values of the base coordinates by running the select statement
> shown above.  That statement threw an error complaining that x_coordinate
> and y_coordinate did not exist.  This is correct; I should have been
> querying a view that includes those fields.  But why didn’t the update
> statement throw an error?

Because inventory contains those fields.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres not using GiST index in a lateral join

2015-03-04 Thread Alban Hertroys


> On 04 Mar 2015, at 22:18, Igor Stassiy  wrote:
> 
> I would like to stop executing the query for a row of table "a" when a single 
> row of "b" is found. This query would not stop
> processing but will filter all the rows that are found at the end of 
> execution.
> 
> Is there a way to express this without a subquery?

Does it? Because that would be somewhat surprising.

> On Wed, Mar 4, 2015 at 11:53 PM Paul Ramsey  wrote:
> Stop writing so many subqueries, think in joins; the poor planner!
> 
> SELECT DISTINCT ON (a.id) a.id AS a_id, b.id AS b_id
> FROM a
> JOIN b
> ON ST_Contains(b.shape, a.shape)
> WHERE b.kind != 1
> 
> Also, the DISTINCT ON syntax (PgSQL custom) lets you winnow a result
> set down to just one of the inputs.


> > -- for each row in A, select exactly one row in B (if there is one)
> > -- such that B contains geometry of A
> > EXPLAIN SELECT * FROM (SELECT A.id as aid, (SELECT B.id FROM B WHERE
> > ST_Contains(B.shape, A.shape) AND B.kind != 1 LIMIT 1) as bid FROM A) AS
> > TMP;
> >
> > which gives me { "Plan": { "Node Type": "Seq Scan", "Relation Name": "A",
> > "Startup Cost": 0.00, "Total Cost": 2606592.33, "Plan Rows": 549745, "Plan
> > Width": 1646, "Plans": [ { "Node Type": "Limit", "Parent Relationship":
> > "SubPlan", "Subplan Name": "SubPlan 1", "Startup Cost": 0.00, "Total Cost":
> > 4.68, "Plan Rows": 1, "Plan Width": 8, "Plans": [ { "Node Type": "Seq Scan",
> > "Parent Relationship": "Outer", "Relation Name": "B", "Startup Cost": 0.00,
> > "Total Cost": 4.68, "Plan Rows": 1, "Plan Width": 8, "Filter": "((shape &&
> > A.shape) AND _st_contains(shape, A.shape))" } ] } ] } }

How did your query plan end up in JSON notation? It's quite difficult to read 
like this.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1306 matches

Mail list logo