[GENERAL] trigger Before or After

2014-11-10 Thread avpro avpro
hi,

in the pgsql documentation
(http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html)

i haven't seen anything referring to: how is affected the data inserted in
the new table by a trigger Before Insert compared with a trigger After
Insert? and anything related to performance

for example:

tables: actuals (summarize the total running hours), log (the functional
hours are inserted in LOG as time)
 function: sum
view: timeview (where running hours are calculated as a difference)

-- Function: sum()

-- DROP FUNCTION sum();

CREATE OR REPLACE FUNCTION sum()
 RETURNS trigger AS
$BODY$begin
update actuals
set
hours = hours + (select time from time_view
where idlog = (select max(idlog) from timeview))
where actuals.idmac =
(SELECT idmac FROM selectedmac) ;
return new;
end$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
ALTER FUNCTION sum()
  OWNER TO user;




--trigger
CREATE TRIGGER update_actuals_tg01
  AFTER INSERT
  ON log
  FOR EACH ROW
  EXECUTE PROCEDURE sum();


I read somewhere (I don't find the link anymore) that if the trigger is
After Insert, the data available in the table LOG might not be available
anymore to run the trigger. is that correct? or I might understood wrong?

what's the difference related to performance concerning a trigger Before
Insert compared with a trigger After Insert?

thank you
have a sunny day


[GENERAL] Integrating PostgreSQL with Elasticsearch and Kibana

2014-11-10 Thread rahulk
Hi All, 

I have source input which stores data in PostgreSQL database in JSON format.
I want to perform the analytical operation on data present in PostgreSQL and
visualize the results of operations. 

I have found one open source analytical tool called Kibana (integrated with
elasticsearch) which will visualize the data.
http://www.elasticsearch.org/overview/kibana/ 

we can provide data present in PostgreSQL to elastic search through use of
jdbc-river
https://github.com/jprante/elasticsearch-river-jdbc/wiki/Step-by-step-recipe-for-setting-up-the-river-with-PostgreSQL
 

I am trying to integrate PostgreSQL, elasticsearch and Kibana to perform
analytical operations and visualize data. 

Anybody tried something like that, your views and suggestion will be useful.
Thanks in advance  

Regards,
Rahul Khengare




--
View this message in context: 
http://postgresql.nabble.com/Integrating-PostgreSQL-with-Elasticsearch-and-Kibana-tp5826459.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] justify_interval: days in year

2014-11-10 Thread Ilya Ashchepkov
On Mon, 10 Nov 2014 14:14:15 +0100
Francisco Olarte  wrote:

> On Mon, Nov 10, 2014 at 10:07 AM, Ilya Ashchepkov 
> wrote:
> 
> > Is this correct?
> > $ select
> >   justify_interval(t-f-i)
> >   ,t-f-i
> > from
> >   (
> > select
> >   timestamp '2014-1-1' as f,
> >   timestamp '2015-1-1' as t,
> >   interval '06:49:00' as i
> >   ) S;
> >
> > justify_interval|  ?column?
> > +
> >  1 year 4 days 17:11:00 | 365 days -06:49:00
> >
> > Why do we have 360 days in a year?
> >
> > Seems to be, the docs says:
> 
> justify_days(interval) interval Adjust interval so 30-day time
> periods are represented as months justify_days(interval '35 days') 1
> mon 5 days justify_hours(interval) interval Adjust interval so
> 24-hour time periods are represented as days justify_hours(interval
> '27 hours') 1 day 03:00:00 justify_interval(interval) interval Adjust
> interval using justify_days and justify_hours, with additional sign
> adjustments justify_interval(interval '1 mon -1 hour') 29 days
> 23:00:00 And, IIRC, an interval stores 3 parts, months, days and
> secods. The 365 days are justified to 12 months 5 days by justify
> days, then the 5 days - 6:49:0 are justified to 4 days 17:11:00 and
> then the 12 months are printed as 1 year ( because months are printed
> as MM/12 years, MM%12 months ).
> 
> What I'm not sure is why the right column is stated as 365 days minus
> six hours and change instead of 1 year minus six hours, maybe testing
> the parts in sequence may shed light on it, but I wouldn't rely on
> justify_xxx for any precise thing on big intervals given it's
> definition ( unless you are working only with intervals and using 30
> days months, 24 hour days exclusively ).
> 
> Francisco Olarte.
> 

Thanks. I thought about justify_interval as function that formats
interval to human readable string. Justify_hours result is more
obvious. I will use it.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Concat bigint Array Values in a Function

2014-11-10 Thread Vinay Gupta
Table A : 220 MB
Table B : 120 KB

Yeah figuring out how to write it in pure SQL.

Thanks

On Tue, Nov 11, 2014 at 6:15 AM, David G Johnston <
david.g.johns...@gmail.com> wrote:

> Vinay Gupta wrote
> > Please let me know if there is any optimal way to finish this function in
> > ~
> > 2-3 secs
>
> If you can figure out how to write that in pure SQL you might have a
> chance.
>
> David J.
>
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Concat-bigint-Array-Values-in-a-Function-tp5826428p5826433.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Updating timezone setting

2014-11-10 Thread Adrian Klaver

On 11/10/2014 02:04 PM, Keith Fiske wrote:

Discovered that a client of ours had assumed their database was running
in UTC, but turned out to be running in 'US/Eastern'. They had changed
all their systems a while ago to run in UTC but didn't realize the
database was independent of that. The postgresql.conf has

timezone = 'US/Eastern'

set in it. It also looks like they have all their timestamp columns set
to "timestamp without time zone". But their application has been running
in UTC, so the times being inserted are the time in UTC. So when queries
are run locally in pql, they appear to be ahead of now(), which would be
expected.

They'd like to get this fix, at least so that the database is running in
UTC. Is there any issue with just changing the postgresql.conf timezone
to GMT and restarting? All of their applications are inserting time in
UTC anyway, and have been for as long as they can remember.




A quick test:

psql (9.3.5)
Type "help" for help.


test=# \d timestamp_test
  Table "public.timestamp_test"
 Column |Type | Modifiers
+-+---
 id | integer |
 ts | timestamp without time zone |
 ts_z   | timestamp with time zone|


test=# show timezone;
  TimeZone

 US/Pacific
(1 row)

test=# insert into timestamp_test values (1, now(), now());
INSERT 0 1

test=# insert into timestamp_test values (2, now() at time zone 'utc', 
now());

INSERT 0 1

test=# set timezone='utc';
SET

test=# show timezone;
 TimeZone
--
 UTC
(1 row)


test=# insert into timestamp_test values (3, now() at time zone 'utc', 
now());

INSERT 0 1

test=# select * from timestamp_test ;
 id | ts | ts_z
++---
  1 | 2014-11-10 17:01:20.675443 | 2014-11-11 01:01:20.675443+00
  2 | 2014-11-11 01:01:33.484436 | 2014-11-11 01:01:33.484436+00
  3 | 2014-11-11 01:01:48.074499 | 2014-11-11 01:01:48.074499+00
(3 rows)

test=# set timezone='US/Pacific';
SET

test=# show timezone;
  TimeZone

 US/Pacific
(1 row)

test=# select * from timestamp_test ;
 id | ts | ts_z
++---
  1 | 2014-11-10 17:01:20.675443 | 2014-11-10 17:01:20.675443-08
  2 | 2014-11-11 01:01:33.484436 | 2014-11-10 17:01:33.484436-08
  3 | 2014-11-11 01:01:48.074499 | 2014-11-10 17:01:48.074499-08
(3 rows)



Would there be any issues then just leaving the columns as "timestamp
without time zone"? I know that's not ideal, but that would be a big
project to try and convert every single one of those columns.


The plus for converting is that timestamp with timezone anchors the time 
and you do not have to worry as much, about what the application is 
doing. That being said, I could see rolling that out over time:)




Thanks,

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Concat bigint Array Values in a Function

2014-11-10 Thread David G Johnston
Vinay Gupta wrote
> Please let me know if there is any optimal way to finish this function in
> ~
> 2-3 secs

If you can figure out how to write that in pure SQL you might have a chance.

David J.




--
View this message in context: 
http://postgresql.nabble.com/Concat-bigint-Array-Values-in-a-Function-tp5826428p5826433.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] repmgr

2014-11-10 Thread John R Pierce

On 11/10/2014 4:10 PM, Robin Ranjit Singh Chauhan wrote:

I havent been able to find much about repmgr on postgres 9.3

Is repmgr still a significant value add given the newer replication 
features built in?


repmgr is a management tool for setting up and controlling the built in 
replication features.




--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] repmgr

2014-11-10 Thread Robin Ranjit Singh Chauhan
I havent been able to find much about repmgr on postgres 9.3

Is repmgr still a significant value add given the newer replication
features built in?

Thanks.
=R
-- 
Robin R Chauhan
CEO, Pathway Intelligence Inc
ro...@pathwayi.com
Office: 778-588-6217 Ext. 201
Cell: 604-865-0517
http://pathwayi.com/


[GENERAL] Concat bigint Array Values in a Function

2014-11-10 Thread Vinay Gupta
Hi,

I need to optimise and aggregate array integer values in a function and i
am doing this by below stored function :

-- type def
CREATE TYPE fun_type AS (
g_id integer,
zip_id_list integer[],
city_id_list integer[],
state_id_list integer[],
lat_long_id_list bigint[],
country_id_list bigint[],
ccid_list bigint[],
cr_id_list bigint[],
is_active boolean
);

-- fun def

CREATE OR REPLACE FUNCTION fun_multicountry()
  RETURNS SETOF fun_type AS
$BODY$
DECLARE
row1 fun_type%ROWTYPE;
dt1 record;
dt1outer record;
BEGIN
FOR dt1outer IN SELECT DISTINCT g_id FROM tableA
LOOP
row1.country_id_list = '{}';
row1.state_id_list = '{}';
row1.city_id_list = '{}';
row1.lat_long_id_list = '{}';
row1.zip_id_list = '{}';
row1.ccid_list = '{}';
row1.cr_id_list = '{}';
row1.g_id = dt1outer.g_id;
row1.is_active = false;
FOR dt1 IN SELECT * FROM tableA LEFT OUTER JOIN tableB
ON (pr_id =  tableB.id) where
g_id = *dt1outer.g_id* AND tableA.is_active = true

LOOP
row1.is_active = true;
IF(dt1.geot_type_id =1 and dt1.pr_id is not NULL)
THEN
row1.cr_id_list = row1.cr_id_list|| dt1.targeting_ids;
ELSIF((dt1.state_id_list is null or dt1.state_id_list =
'{}') AND
(dt1.city_id_list is null or dt1.city_id_list = '{}') AND
(dt1.lat_long_id_list is null or dt1.lat_long_id_list =
'{}') AND
(dt1.zip_id_list is null or dt1.zip_id_list = '{}'))
THEN
row1.country_id_list = row1.country_id_list ||
dt1.country_id;
ELSE
row1.state_id_list = row1.state_id_list ||
dt1.state_id_list;
row1.city_id_list = row1.city_id_list || dt1.city_id_list;
row1.lat_long_id_list = row1.lat_long_id_list ||
dt1.lat_long_id_list;
row1.zip_id_list = row1.zip_id_list || dt1.zip_id_list;
END IF;
IF(dt1.ccid_list is null or dt1.ccid_list = '{}')
THEN
 row1.ccid_list = row1.ccid_list || -dt1.country_id;
ELSE
row1.ccid_list = row1.ccid_list || dt1.ccid_list;
END IF;
END LOOP;
return NEXT row1;
END LOOP;
END
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100
  ROWS 1000;

Above function is taking ~ 30 -40 sec to finish where as table rows are :

explain analyze select fun_multicountry();

 QUERY PLAN




 Result  (cost=0.00..5.25 rows=1000 width=0) (actual
time=21855.881..21959.683 rows=420286 loops=1)

 Total runtime: 31977.712 ms

(2 rows)


Time: 21978.103 ms

select count(*) from tableA;

 count



 629439

(1 row)


Time: 135.858 ms

select count(*) from tableB;

 count

---

   841

select count(distinct g_id) from tableA;

 count



 420287

I need to aggregate all different array values in table depending upon
above if conditions and return them. Is dere any optimal approach to do
this as i am calling this function in many other functions and its just
adding to slowness. I am trying to rewrite using CASE statements. Will this
help ?

Please let me know if there is any optimal way to finish this function in ~
2-3 secs

Thanks


[GENERAL] Updating timezone setting

2014-11-10 Thread Keith Fiske
Discovered that a client of ours had assumed their database was running in
UTC, but turned out to be running in 'US/Eastern'. They had changed all
their systems a while ago to run in UTC but didn't realize the database was
independent of that. The postgresql.conf has

timezone = 'US/Eastern'

set in it. It also looks like they have all their timestamp columns set to
"timestamp without time zone". But their application has been running in
UTC, so the times being inserted are the time in UTC. So when queries are
run locally in pql, they appear to be ahead of now(), which would be
expected.

They'd like to get this fix, at least so that the database is running in
UTC. Is there any issue with just changing the postgresql.conf timezone to
GMT and restarting? All of their applications are inserting time in UTC
anyway, and have been for as long as they can remember.

Would there be any issues then just leaving the columns as "timestamp
without time zone"? I know that's not ideal, but that would be a big
project to try and convert every single one of those columns.

Thanks,

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com


Re: [GENERAL] SSL Certificates in Postgres 9.3 and Windows 7

2014-11-10 Thread David G Johnston
Since we are testing...test using Nabble.com and the more->raw option...

"begin quote"
I also get this error message in my log file:

"begin raw"

"end raw"
"end quote"

Following the sentence above should be 4 lines of log file content...

Sent using the Nabble.com interface

Dave




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/SSL-Certificates-in-Postgres-9-3-and-Windows-7-tp5826230p5826396.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SSL Certificates in Postgres 9.3 and Windows 7

2014-11-10 Thread Magnus Hagander
On Mon, Nov 10, 2014 at 8:53 PM, John R Pierce  wrote:
> On 11/10/2014 11:13 AM, Magnus Hagander wrote:
>>
>> Then it's more logical. Nabble simply did not include that when the
>> email was posted. I would call that either a bug or at least a
>> limitation in Nabble - I think only they can figure out what's wrong
>> and if it's fixable. Until then, we'll have to recommend people just
>> post plaintext from there (which seems to work) or use a more capable
>> MUA.
>
>
> I suspect its the centos mail list server thats stripping attachments.  I
> know most of my personal email lists do so.

The post was made to the pgsql-general list AFAICT, and I'm pretty
sure it doesn't strip any attachments. I would be hard to deal with
patches if it did...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SSL Certificates in Postgres 9.3 and Windows 7

2014-11-10 Thread John R Pierce

On 11/10/2014 11:13 AM, Magnus Hagander wrote:

Then it's more logical. Nabble simply did not include that when the
email was posted. I would call that either a bug or at least a
limitation in Nabble - I think only they can figure out what's wrong
and if it's fixable. Until then, we'll have to recommend people just
post plaintext from there (which seems to work) or use a more capable
MUA.


I suspect its the centos mail list server thats stripping attachments.  
I know most of my personal email lists do so.




--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Centos 7 - anyone?

2014-11-10 Thread Adrian Klaver

On 11/10/2014 10:45 AM, Edson Richter wrote:

Hi!

I've just received a new server, and I've the intention to install
Centos7 and PostgreSQL 9.3.5.
After installing the yum repo (as described in
http://www.postgresql.org/download/linux/redhat/), I've installed
everything I need with the command

yum install postgresql93 postgresql93-server postgresql93-contrib

Which eventually worked very well.
After that, I've created a /etc/sysconfig/pgsql/postgresql-9.3
configuration file (as I used to do in Centos 6.5), and then "service
postgresql-9.3 initdb" ! crash, bang, boom.
Wont work, with error:

"[root@localhost pgsql]# service postgresql-9.3 initdb
The service command supports only basic LSB actions (start, stop,
restart, try-restart, reload, force-reload, status). For other actions,
please try to use systemctl.
[root@localhost pgsql]# systemctl --help
systemctl [OPTIONS...] {COMMAND} ..."

So, I've learned that is a new way of doing things... Reading the yum
repository documentation, I've learned:

"|postgresql-setup initdb
systemctl enable postgresql.service"

Which works partially: it have not taken my
/etc/sysconfig/postgresql-9.3 configuration file for path and other things.

Can anyone point me what I've done wrong?


I would say look in the service file and see where is it is looking for 
the configuration file and what that file is named. Best guess it is not 
the same one you created.





Thanks,|

--
Atenciosamente,

Edson Carlos Ericksson Richter




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SSL Certificates in Postgres 9.3 and Windows 7

2014-11-10 Thread Magnus Hagander
On Mon, Nov 10, 2014 at 1:45 AM, David Johnston
 wrote:
> Yes, that is what I was referring to.  The Nabble.com website showed them.
>
> http://postgresql.1045698.n5.nabble.com/SSL-Certificates-in-Postgres-9-3-and-Windows-7-td5826230.html

Then it's more logical. Nabble simply did not include that when the
email was posted. I would call that either a bug or at least a
limitation in Nabble - I think only they can figure out what's wrong
and if it's fixable. Until then, we'll have to recommend people just
post plaintext from there (which seems to work) or use a more capable
MUA.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Centos 7 - anyone?

2014-11-10 Thread Edson Richter

I've found this very useful:

http://people.planetpostgresql.org/devrim/index.php?/archives/82-Running-more-than-one-9.4-9.3-instance-in-parallel-on-RHEL-7.html



Atenciosamente,

Edson Carlos Ericksson Richter

On 10-11-2014 16:45, Edson Richter wrote:

Hi!

I've just received a new server, and I've the intention to install 
Centos7 and PostgreSQL 9.3.5.
After installing the yum repo (as described in 
http://www.postgresql.org/download/linux/redhat/), I've installed 
everything I need with the command


yum install postgresql93 postgresql93-server postgresql93-contrib

Which eventually worked very well.
After that, I've created a /etc/sysconfig/pgsql/postgresql-9.3 
configuration file (as I used to do in Centos 6.5), and then "service 
postgresql-9.3 initdb" ! crash, bang, boom.

Wont work, with error:

"[root@localhost pgsql]# service postgresql-9.3 initdb
The service command supports only basic LSB actions (start, stop, 
restart, try-restart, reload, force-reload, status). For other 
actions, please try to use systemctl.

[root@localhost pgsql]# systemctl --help
systemctl [OPTIONS...] {COMMAND} ..."

So, I've learned that is a new way of doing things... Reading the yum 
repository documentation, I've learned:


"|postgresql-setup initdb
systemctl enable postgresql.service"

Which works partially: it have not taken my 
/etc/sysconfig/postgresql-9.3 configuration file for path and other 
things.


Can anyone point me what I've done wrong?

Thanks,|
--
Atenciosamente,

Edson Carlos Ericksson Richter




[GENERAL] Centos 7 - anyone?

2014-11-10 Thread Edson Richter

Hi!

I've just received a new server, and I've the intention to install 
Centos7 and PostgreSQL 9.3.5.
After installing the yum repo (as described in 
http://www.postgresql.org/download/linux/redhat/), I've installed 
everything I need with the command


yum install postgresql93 postgresql93-server postgresql93-contrib

Which eventually worked very well.
After that, I've created a /etc/sysconfig/pgsql/postgresql-9.3 
configuration file (as I used to do in Centos 6.5), and then "service 
postgresql-9.3 initdb" ! crash, bang, boom.

Wont work, with error:

"[root@localhost pgsql]# service postgresql-9.3 initdb
The service command supports only basic LSB actions (start, stop, 
restart, try-restart, reload, force-reload, status). For other actions, 
please try to use systemctl.

[root@localhost pgsql]# systemctl --help
systemctl [OPTIONS...] {COMMAND} ..."

So, I've learned that is a new way of doing things... Reading the yum 
repository documentation, I've learned:


"|postgresql-setup initdb
systemctl enable postgresql.service"

Which works partially: it have not taken my 
/etc/sysconfig/postgresql-9.3 configuration file for path and other things.


Can anyone point me what I've done wrong?

Thanks,|

--
Atenciosamente,

Edson Carlos Ericksson Richter



Re: [GENERAL] Strange avg value size for index on expression in pg_stats

2014-11-10 Thread Jehan-Guillaume de Rorthais
On Mon, 10 Nov 2014 09:39:23 -0500
Tom Lane  wrote:

> Jehan-Guillaume de Rorthais  writes:
> > While investigating about a wrong result with my btree bloat estimation
> > query, I found a strange stat deviation between the average size of a value
> > in its table and its average size in one index on the "lower" expression.
> 
> ANALYZE doesn't look at the physical index at all.  For an expression
> index, it will recompute the index expression at each sampled row, and
> then take statistics on those values.  In the case you've got here, it's
> re-running the lower() function and looking at the uncompressed result.
> So that accounts for the discrepancy in average width stats.

Ok, understood.

But in my demo scenario, I used *only* md5 to populate the test table. So
data length is always 32 and the average width length is exact. No possible
deviation, even with lower(). To be quite dense:

  postgres@test=# select 
length(lower(md5('a'))) = length(md5('b')), 
length(md5('c')) = length(md5('d'));
  -[ RECORD 1 ]
  ?column? | t
  ?column? | t

And here is another test with a static string for all rows:

  postgres@test=# create table test as
  select '1234567890123456789012'::text as t
  from generate_series(1,100);
  SELECT 100
  
  postgres@test=# create index ON test (lower(t));
  CREATE INDEX
  
  postgres@test=# analyze test;
  ANALYZE

  postgres@test=# select tablename, attname, avg_width
  from pg_stats
  where schemaname = 'public';
 tablename| attname | avg_width 
  +-+---
   test   | t   |23
   test_lower_idx | lower   |26


AFAIU, we should not have a discrepancy here.

> > This tiny difference is the source of a very bad estimation with the
> > Btree bloat estimation query when values are around an alignement
> > boundary.
> 
> TBH, if that query is relying on ANALYZE width estimates to be accurate
> to the last byte, its reliability is going to be horrid anyway.

Well, I'm aware of that. I don't need an accuracy to the last byte. This query
doesn't even pay attention to the data alignment padding anyway (I measured some
10% deviation in a case because of this).

This request only helps guessing the bloat evolution in Btrees or quickly
discover *big* deviations. In many situations, we can not afford a call to
pgstattuple.avg_leaf_density().

But this statistic difference between two values with the exact same size is
itching me. Sa far, I couldn't find a logical explanation and it just looks
like a wrong statistic.

Regards,
-- 
Jehan-Guillaume (ioguix) de Rorthais


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange avg value size for index on expression in pg_stats

2014-11-10 Thread Tom Lane
Jehan-Guillaume de Rorthais  writes:
> While investigating about a wrong result with my btree bloat estimation query,
> I found a strange stat deviation between the average size of a value in its
> table and its average size in one index on the "lower" expression.

ANALYZE doesn't look at the physical index at all.  For an expression
index, it will recompute the index expression at each sampled row, and
then take statistics on those values.  In the case you've got here, it's
re-running the lower() function and looking at the uncompressed result.
So that accounts for the discrepancy in average width stats.

> This tiny difference is the source of a very bad estimation with the
> Btree bloat estimation query when values are around an alignement
> boundary.

TBH, if that query is relying on ANALYZE width estimates to be accurate
to the last byte, its reliability is going to be horrid anyway.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange result using pg_dump gzip or split.

2014-11-10 Thread Adrian Klaver

On 11/10/2014 03:34 AM, Condor wrote:


Hello,

I found strange result when I use pg_dump described on postgresql site:
http://www.postgresql.org/docs/9.3/static/backup-dump.html

I have a database with 30 gb data and decide to archive it, postgresql
is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current)


How did you determine there is 30GB of data?




First I use gzip with : pg_dump logdb | gzip > log.sql.gz

After a few minute I have log.sql.gz with size 2 170 016 226
Well, that is strange and I dump database again with:

pg_dump logdb | split -b 1024m - log.sql

20 files is generated and I zip them with:

zip -r log.sql.zip logdir (because I move them in logdir)

file size is : 2 170 020 867

Almost the same, but if I check size in archives there is a huge
difference.


Any reason for not using pg_dump -Fc and get the built in compression?




$ gzip -l log.sql.gz
  compresseduncompressed  ratio uncompressed_name
  2170016226  3060688725  29.1% log_to.sql

and


$ unzip -v log.sql.zip
*** snip ***
  ---  ------
20240557909 2170020867  89%20 files


Here is difference: with gzip I have 29.1% compress ratio and
uncompressed size is 3 060 688 725 which means 3 GB
and with zip I have 89% compress ratio and uncompressed size is 20 240
557 909 witch mean 20 GB. That is 7 times bigger.

My question is: Is there some special config params that is not
described in documentation here:
http://www.postgresql.org/docs/9.3/static/backup-dump.html
Or something need to be configured on my linux.

And most important question for me is: Did the database dump is corrupt
or not ?



Regards,

Hristo Simeonov





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange result using pg_dump gzip or split.

2014-11-10 Thread Francisco Olarte
Followup, second try.

First of all, I'd like to apologize to the list for my previous message, I
borked some finger gymnastics when switching tabs and sent and incomplete
one. My fault. Sorry.

Now what I tried to say was:

I did not spot it at first, looking at
http://www.gzip.org/zlib/rfc-gzip.html#file-format I see:

ISIZE (Input SIZE) This contains the size of the original (uncompressed)
input data modulo 2^32. And given gzip -l is usually much faster than -tv I
suspect it's just reporting this size.

Francisco Olarte.


Re: [GENERAL] Strange result using pg_dump gzip or split.

2014-11-10 Thread Francisco Olarte
Hi Condor.

Followup, I did not spot it at first, looking at
http://www.gzip.org/zlib/rfc-gzip.html#file-format I see:

On Mon, Nov 10, 2014 at 12:34 PM, Condor  wrote:

>
> Hello,
>
> I found strange result when I use pg_dump described on postgresql site:
> http://www.postgresql.org/docs/9.3/static/backup-dump.html
>
> I have a database with 30 gb data and decide to archive it, postgresql is
> 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current)
>
>
> First I use gzip with : pg_dump logdb | gzip > log.sql.gz
>
> After a few minute I have log.sql.gz with size 2 170 016 226
> Well, that is strange and I dump database again with:
>
> pg_dump logdb | split -b 1024m - log.sql
>
> 20 files is generated and I zip them with:
>
> zip -r log.sql.zip logdir (because I move them in logdir)
>
> file size is : 2 170 020 867
>
> Almost the same, but if I check size in archives there is a huge
> difference.
>
>
> $ gzip -l log.sql.gz
>  compresseduncompressed  ratio uncompressed_name
>  2170016226  3060688725  29.1% log_to.sql
>
> and
>
>
> $ unzip -v log.sql.zip
> *** snip ***
>   ---  ------
> 20240557909 2170020867  89%20 files
>
>
> Here is difference: with gzip I have 29.1% compress ratio and uncompressed
> size is 3 060 688 725 which means 3 GB
> and with zip I have 89% compress ratio and uncompressed size is 20 240 557
> 909 witch mean 20 GB. That is 7 times bigger.
>
> My question is: Is there some special config params that is not described
> in documentation here: http://www.postgresql.org/
> docs/9.3/static/backup-dump.html
> Or something need to be configured on my linux.
>
> And most important question for me is: Did the database dump is corrupt or
> not ?
>
>
>
> Regards,
>
> Hristo Simeonov
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Strange result using pg_dump gzip or split.

2014-11-10 Thread Francisco Olarte
Hi Condor.

On Mon, Nov 10, 2014 at 12:34 PM, Condor  wrote:

> I have a database with 30 gb data and decide to archive it, postgresql is
> 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 (current)
>

You should have a look at your tools, it seems you have a file size
problem


> First I use gzip with : pg_dump logdb | gzip > log.sql.gz
> ...
>
 $ gzip -l log.sql.gz

>  compresseduncompressed  ratio uncompressed_name
>  2170016226  3060688725  29.1% log_to.sql
> $ unzip -v log.sql.zip
> *** snip ***
>   ---  ------
> 20240557909 2170020867  89%20 files
>

When you have this kind of problem, your first thing should be to pass
everything to hex:

2170016226=0x8157D1E2

2170020867=0x8157E403


Not a great difference there, this is normal, but on the uncompressed side:


20240557909=0x4B66E6755

3060688725=0xB66E6755

Mmmm, something phisy here, it seems gzip is using 32 bits only, so it gets
the things wrong.  You can investigate more from there.
 If you can spare the
disk space ( which seems to since you had it for the split/zip ) you should
try to gunzip it, and see how big it comes out ( I would recommenf 'gzip
-tv once to see what it does print, and then gunzip -cv > xxx to preserve
the input AND get verbose output ). The problem seems to be with gzip.

Francisco Olarte.


Re: [GENERAL] justify_interval: days in year

2014-11-10 Thread Francisco Olarte
On Mon, Nov 10, 2014 at 10:07 AM, Ilya Ashchepkov  wrote:

> Is this correct?
> $ select
>   justify_interval(t-f-i)
>   ,t-f-i
> from
>   (
> select
>   timestamp '2014-1-1' as f,
>   timestamp '2015-1-1' as t,
>   interval '06:49:00' as i
>   ) S;
>
> justify_interval|  ?column?
> +
>  1 year 4 days 17:11:00 | 365 days -06:49:00
>
> Why do we have 360 days in a year?
>
> Seems to be, the docs says:

justify_days(interval) interval Adjust interval so 30-day time periods are
represented as months justify_days(interval '35 days') 1 mon 5 days
justify_hours(interval) interval Adjust interval so 24-hour time periods
are represented as days justify_hours(interval '27 hours') 1 day 03:00:00
justify_interval(interval) interval Adjust interval using justify_days and
justify_hours, with additional sign adjustments justify_interval(interval
'1 mon -1 hour') 29 days 23:00:00
 And, IIRC, an interval stores 3 parts, months, days and secods. The 365
days are justified to 12 months 5 days by justify days, then the 5 days -
6:49:0 are justified to 4 days 17:11:00 and then the 12 months are printed
as 1 year ( because months are printed as MM/12 years, MM%12 months ).

What I'm not sure is why the right column is stated as 365 days minus six
hours and change instead of 1 year minus six hours, maybe testing the parts
in sequence may shed light on it, but I wouldn't rely on justify_xxx for
any precise thing on big intervals given it's definition ( unless you are
working only with intervals and using 30 days months, 24 hour days
exclusively ).

Francisco Olarte.


Re: [GENERAL] Strange result using pg_dump gzip or split.

2014-11-10 Thread John R Pierce

On 11/10/2014 3:34 AM, Condor wrote:

Did the database dump is corrupt or not ?


try restoring them to a new database



--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Strange result using pg_dump gzip or split.

2014-11-10 Thread Condor


Hello,

I found strange result when I use pg_dump described on postgresql site: 
http://www.postgresql.org/docs/9.3/static/backup-dump.html


I have a database with 30 gb data and decide to archive it, postgresql 
is 9.3.5 x64_86, ext4 file system, kernel 3.14.18 Slackware 14.2 
(current)



First I use gzip with : pg_dump logdb | gzip > log.sql.gz

After a few minute I have log.sql.gz with size 2 170 016 226
Well, that is strange and I dump database again with:

pg_dump logdb | split -b 1024m - log.sql

20 files is generated and I zip them with:

zip -r log.sql.zip logdir (because I move them in logdir)

file size is : 2 170 020 867

Almost the same, but if I check size in archives there is a huge 
difference.



$ gzip -l log.sql.gz
 compresseduncompressed  ratio uncompressed_name
 2170016226  3060688725  29.1% log_to.sql

and


$ unzip -v log.sql.zip
*** snip ***
  ---  ------
20240557909 2170020867  89%20 files


Here is difference: with gzip I have 29.1% compress ratio and 
uncompressed size is 3 060 688 725 which means 3 GB
and with zip I have 89% compress ratio and uncompressed size is 20 240 
557 909 witch mean 20 GB. That is 7 times bigger.


My question is: Is there some special config params that is not 
described in documentation here: 
http://www.postgresql.org/docs/9.3/static/backup-dump.html

Or something need to be configured on my linux.

And most important question for me is: Did the database dump is corrupt 
or not ?




Regards,

Hristo Simeonov


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange avg value size for index on expression in pg_stats

2014-11-10 Thread Jehan-Guillaume de Rorthais
On Mon, 10 Nov 2014 11:52:54 +0100
Jehan-Guillaume de Rorthais  wrote:

> Hello,
> 
> I'm not sure if I should post here or on pgsql-hackers.
> 
> While investigating about a wrong result with my btree bloat estimation query,
> I found a strange stat deviation between the average size of a value in its
> table and its average size in one index on the "lower" expression.

Oh, I forgot to add links to this query, for testing purpose:

  * https://gist.github.com/ioguix/c29d5790b8b93bf81c27
  * explanations, see http://blog.ioguix.net/tag/bloat/

Cheers,


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Strange avg value size for index on expression in pg_stats

2014-11-10 Thread Jehan-Guillaume de Rorthais
Hello,

I'm not sure if I should post here or on pgsql-hackers.

While investigating about a wrong result with my btree bloat estimation query,
I found a strange stat deviation between the average size of a value in its
table and its average size in one index on the "lower" expression.

Take the following scenario:

  postgres@test=# create table test as
  test-# select md5(t::text) AS t 
  test-# from generate_series(1, 100) t;
  SELECT 100
  
  postgres@test=# create index ON test (lower(t));
  CREATE INDEX
  
  postgres@test=# analyze test;
  ANALYZE
  
  postgres@test=# select tablename, attname, avg_width
  test-# from pg_stats
  test-# where schemaname = 'public';
 tablename| attname | avg_width 
  +-+---
   test   | t   |33
   test_lower_idx | lower   |36

Md5 values are always 32 bytes long. In the table, pg_stats reports 33 because
of the text header. In the index, the reported value is 36!

Looking at the page layout documentation and in the index using hexdump, I can
not find any answer about this 3 bytes. PFA the "hexdump -C" output from the
index. For each row, we clearly see a 8 bytes row header followed by a ONE byte
value header (43 'C'), the 32 bytes of the md5 and 7 bytes of padding (00).

A wild guess would be that ANALYZE is considering a text field from an
expression has always a 4 bytes header whatever its actual size (text field
header size is one if the value is < 127 bytes long, 4 in other situations).


This tiny difference is the source of a very bad estimation with the Btree bloat
estimation query when values are around an alignement boundary. As instance,
here is the use case that lead me to this:

   tablename  | attname | avg_width 
  +-+---
   customer   | email   |23
   customer_lower_idx | lower   |26

We have an index on email, and another one on lower(index). The first one is
aligned on 24, the second one on 32. Leading to bloat estimation of 17% for
the first one and -1% for the second one (the estimated index size is bigger
than the real one).

Any information about this from a hacker? Do anyone have an explanation about
this? Is it something that's worth posting on pgsql-hackers?


Regards,  00 00 00 00 00 00 00 00  00 00 00 00 30 00 f0 1f  |0...|
0010  f0 1f 04 20 00 00 00 00  62 31 05 00 02 00 00 00  |... b1..|
0020  01 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  ||
0030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
1ff0  00 00 00 00 00 00 00 00  00 00 00 00 08 00 00 00  ||
2000  00 00 00 00 00 00 00 00  00 00 00 00 a8 01 30 0d  |..0.|
2010  f0 1f 04 20 00 00 00 00  c0 9f 60 00 90 9f 60 00  |... ..`...`.|
2020  60 9f 60 00 30 9f 60 00  00 9f 60 00 d0 9e 60 00  |`.`.0.`...`...`.|
2030  a0 9e 60 00 70 9e 60 00  40 9e 60 00 10 9e 60 00  |..`.p.`.@.`...`.|
2040  e0 9d 60 00 b0 9d 60 00  80 9d 60 00 50 9d 60 00  |..`...`...`.P.`.|
2050  20 9d 60 00 f0 9c 60 00  c0 9c 60 00 90 9c 60 00  | .`...`...`...`.|
2060  60 9c 60 00 30 9c 60 00  00 9c 60 00 d0 9b 60 00  |`.`.0.`...`...`.|
2070  a0 9b 60 00 70 9b 60 00  40 9b 60 00 10 9b 60 00  |..`.p.`.@.`...`.|
2080  e0 9a 60 00 b0 9a 60 00  80 9a 60 00 50 9a 60 00  |..`...`...`.P.`.|
2090  20 9a 60 00 f0 99 60 00  c0 99 60 00 90 99 60 00  | .`...`...`...`.|
20a0  60 99 60 00 30 99 60 00  00 99 60 00 d0 98 60 00  |`.`.0.`...`...`.|
20b0  a0 98 60 00 70 98 60 00  40 98 60 00 10 98 60 00  |..`.p.`.@.`...`.|
20c0  e0 97 60 00 b0 97 60 00  80 97 60 00 50 97 60 00  |..`...`...`.P.`.|
20d0  20 97 60 00 f0 96 60 00  c0 96 60 00 90 96 60 00  | .`...`...`...`.|
20e0  60 96 60 00 30 96 60 00  00 96 60 00 d0 95 60 00  |`.`.0.`...`...`.|
20f0  a0 95 60 00 70 95 60 00  40 95 60 00 10 95 60 00  |..`.p.`.@.`...`.|
2100  e0 94 60 00 b0 94 60 00  80 94 60 00 50 94 60 00  |..`...`...`.P.`.|
2110  20 94 60 00 f0 93 60 00  c0 93 60 00 90 93 60 00  | .`...`...`...`.|
2120  60 93 60 00 30 93 60 00  00 93 60 00 d0 92 60 00  |`.`.0.`...`...`.|
2130  a0 92 60 00 70 92 60 00  40 92 60 00 10 92 60 00  |..`.p.`.@.`...`.|
2140  e0 91 60 00 b0 91 60 00  80 91 60 00 50 91 60 00  |..`...`...`.P.`.|
2150  20 91 60 00 f0 90 60 00  c0 90 60 00 90 90 60 00  | .`...`...`...`.|
2160  60 90 60 00 30 90 60 00  00 90 60 00 d0 8f 60 00  |`.`.0.`...`...`.|
2170  a0 8f 60 00 70 8f 60 00  40 8f 60 00 10 8f 60 00  |..`.p.`.@.`...`.|
2180  e0 8e 60 00 b0 8e 60 00  80 8e 60 00 50 8e 60 00  |..`...`...`.P.`.|
2190  20 8e 60 00 f0 8d 60 00  c0 8d 60 00 90 8d 60 00  | .`...`...`...`.|
21a0  60 8d 60 00 30 8d 60 00  30 8d 60 00 00 00 00 00  |`.`.0.`.0.`.|
21b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
2d30  00 00 00 00 53 00 30 40  43 66 65 39 66 63 32 38  |S.0@Cfe9fc28|

[GENERAL] justify_interval: days in year

2014-11-10 Thread Ilya Ashchepkov
Hello!

Is this correct?
$ select
  justify_interval(t-f-i)
  ,t-f-i
from
  (
select
  timestamp '2014-1-1' as f,
  timestamp '2015-1-1' as t,
  interval '06:49:00' as i
  ) S;

justify_interval|  ?column?
+
 1 year 4 days 17:11:00 | 365 days -06:49:00

Why do we have 360 days in a year?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general