Re: [GENERAL] Unique constraint on field inside composite type.

2016-08-22 Thread Adrian Klaver

On 08/22/2016 06:23 PM, Tom Lane wrote:

Adrian Klaver  writes:

On 08/17/2016 11:02 PM, Silk Parrot wrote:

CREATE TABLE user (
uuid UUID PRIMARY KEY DEFAULT public.uuid_generate_v4(),
google_user system.google_user,
facebook_user system.facebook_user,
UNIQUE (google_user.email)
);
ERROR:  syntax error at or near "."
LINE 10: UNIQUE (google_user.email)

Is there a way to create unique constraint on a field inside composite type?



I tried David's suggestion:
(google_user).email
and that did not work, but it got me to thinking, so:


You'd need additional parens around the whole thing, like

create unique index on "user"(((google_user).email));


Aah, I did not bury it deep enough, my attempt:

create unique index g_u on  test_user ((google_user).email));



The UNIQUE-constraint syntax will never work, because per SQL standard
such constraints can only name simple columns.  But you can make
a unique index separately.

regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Permissions pg_dump / import

2016-08-22 Thread Tom Lane
Patrick B  writes:
> I'm doing a pg_dump and a pg_restore on the same command, using different
> usernames and databases names.:
> ...
> But I'm getting some permissions errors:
> could not execute query: ERROR:  role "devel" does not exist

If that's from

> REVOKE ALL ON SCHEMA public FROM devel;

it's not a permissions error, it's complaining there's no such role
to grant/revoke from in the destination DB.  You may want to use
--no-privileges along with --no-owner if the destination doesn't
have the same set of users as the source.  Or just ignore these errors.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Permissions pg_dump / import

2016-08-22 Thread Patrick B
Hi guys,

I'm doing a pg_dump and a pg_restore on the same command, using different
usernames and databases names.:


pg_dump --format=custom -v --no-password --no-owner --username=teste1
> --dbname=test1 --host=11.11.11.11 | pg_restore -v --schema=public
> --no-password --no-owner --username=master --host=11.11.11.12
> --dbname=new_test1


But I'm getting some permissions errors:

could not execute query: ERROR:  role "devel" does not exist

To fix that, I ran on the server; also I'm using "*--no-owner*" and though
this kind of problem wouldn't be happening?

REVOKE ALL ON SCHEMA public FROM devel;



But it seems not working, as I'm still getting the errors.



Do you guys have any tips to solve this one?

Cheers

Patrick


Re: [GENERAL] Unique constraint on field inside composite type.

2016-08-22 Thread Tom Lane
Adrian Klaver  writes:
> On 08/17/2016 11:02 PM, Silk Parrot wrote:
>> CREATE TABLE user (
>> uuid UUID PRIMARY KEY DEFAULT public.uuid_generate_v4(),
>> google_user system.google_user,
>> facebook_user system.facebook_user,
>> UNIQUE (google_user.email)
>> );
>> ERROR:  syntax error at or near "."
>> LINE 10: UNIQUE (google_user.email)
>> 
>> Is there a way to create unique constraint on a field inside composite type?

> I tried David's suggestion:
> (google_user).email
> and that did not work, but it got me to thinking, so:

You'd need additional parens around the whole thing, like

create unique index on "user"(((google_user).email));

The UNIQUE-constraint syntax will never work, because per SQL standard
such constraints can only name simple columns.  But you can make
a unique index separately.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why insertion throughput can be reduced with an increase of batch size?

2016-08-22 Thread Adrian Klaver

On 08/21/2016 11:53 PM, Павел Филонов wrote:

My greetings to everybody!

I recently faced with the observation which I can not explain. Why
insertion throughput can be reduced with an increase of batch size?

Brief description of the experiment.

  * PostgreSQL 9.5.4 as server
  * https://github.com/sfackler/rust-postgres library as client driver
  * one relation with two indices (scheme in attach)

Experiment steps:

  * populate DB with 25920 random records
  * start insertion for 60 seconds with one client thread and batch size = m
  * record insertions per second (ips) in clients code

Plot median ips from m for m in [2^0, 2^1, ..., 2^15] (in attachment).


On figure with can see that from m = 128 to m = 256 throughput have been
reduced from 13 000 ips to 5000.

I hope someone can help me understand what is the reason for such behavior?


To add to Jeff's questions:

You say you are measuring the IPS in the clients code.

Where is the client, on the same machine, same network or remote network?



--
Best regards
Filonov Pavel






--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unique constraint on field inside composite type.

2016-08-22 Thread Adrian Klaver

On 08/17/2016 11:02 PM, Silk Parrot wrote:

Hi,

I am trying to model a social login application. The application can
support multiple login providers. I am thinking of creating a custom
type for each provider. e.g.

CREATE TYPE system.google_user AS (
   email TEXT
);

CREATE TYPE system.facebook_user AS (
   id TEXT
);

   And having user table like:

CREATE TABLE user (
uuid UUID PRIMARY KEY DEFAULT public.uuid_generate_v4(),
google_user system.google_user,
facebook_user system.facebook_user,
UNIQUE (google_user.email)
);

However, the above create table query reports syntax error:

ERROR:  syntax error at or near "."
LINE 10: UNIQUE (google_user.email)

Is there a way to create unique constraint on a field inside composite type?


I tried David's suggestion:

(google_user).email

and that did not work, but it got me to thinking, so:

CREATE OR REPLACE FUNCTION public.comp_type_idx(google_user)
 RETURNS character varying
 LANGUAGE sql
AS $function$ SELECT $1.email $function$

CREATE TABLE test_user (
google_user google_user,
facebook_user facebook_user
);

create unique index g_u on  test_user (comp_type_idx(google_user));

test=# insert into test_user values (ROW('email'), ROW(1));
INSERT 0 1

test=# insert into test_user values (ROW('email'), ROW(1));
ERROR:  duplicate key value violates unique constraint "g_u"
DETAIL:  Key (comp_type_idx(google_user))=(email) already exists.

test=# insert into test_user values (ROW('email2'), ROW(1));
INSERT 0 1

test=# select * from test_user ;
 google_user | facebook_user
-+---
 (email) | (1)
 (email2)| (1)
(2 rows)







--
Regards
Ryan



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unique constraint on field inside composite type.

2016-08-22 Thread Bruno Wolff III

On Wed, Aug 17, 2016 at 23:02:53 -0700,
 Silk Parrot  wrote:

Hi,

    I am trying to model a social login application. The application can 
support multiple login providers. I am thinking of creating a custom type for 
each provider. e.g.

CREATE TABLE user (
    uuid UUID PRIMARY KEY DEFAULT public.uuid_generate_v4(),
    google_user system.google_user,
    facebook_user system.facebook_user,
    UNIQUE (google_user.email)
);


Wouldn't it more sense to have a table you join to your user table that 
is more flexible and allows for multiple entries per person. You would 
need user, domain, foreign_user, auth_method. This would make it a lot 
easier to add other systems later or let users pick their own systems 
that you don't need to know about in advance.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why insertion throughput can be reduced with an increase of batch size?

2016-08-22 Thread Jeff Janes
On Sun, Aug 21, 2016 at 11:53 PM, Павел Филонов  wrote:

> My greetings to everybody!
>
> I recently faced with the observation which I can not explain. Why
> insertion throughput can be reduced with an increase of batch size?
>
> Brief description of the experiment.
>
>- PostgreSQL 9.5.4 as server
>- https://github.com/sfackler/rust-postgres library as client driver
>- one relation with two indices (scheme in attach)
>
> Experiment steps:
>
>- populate DB with 25920 random records
>
> How is populating the database you do in this step different from the
insertions you do in the next step?  Is it just that the goal is to measure
insertions into an already-very-large table?


>
>- start insertion for 60 seconds with one client thread and batch size
>= m
>- record insertions per second (ips) in clients code
>
> Plot median ips from m for m in [2^0, 2^1, ..., 2^15] (in attachment).
>

The median of how many points?  Please plot all points, as well as the
median.  In what order did you cycle through the list of m?  Are you
logging checkpoints? how many checkpoints occur during the run for each
batch size? Have you tuned your database to be targeted at mass insertions?
e.g.. what are max_wal_size, archive_mode, wal_level, wal_buffers,
shared_buffers, and checkpoint_completion_target?  Are you issuing manual
checkpoints between runs?

60 seconds is usually not nearly enough time to benchmark a write-heavy
workload.  The chances are pretty good that what you are seeing is nothing
but statistical artefacts, caused by checkpoints happening to line up with
certain values of batch size.

Cheers,

Jeff


Re: [GENERAL] Unique constraint on field inside composite type.

2016-08-22 Thread David G. Johnston
On Thu, Aug 18, 2016 at 2:02 AM, Silk Parrot  wrote:

>
> However, the above create table query reports syntax error:
>
> ERROR:  syntax error at or near "."
> LINE 10: UNIQUE (google_user.email)
>
> Is there a way to create unique constraint on a field inside composite
> type?
>

​Not tested here but in most (all?) cases when attempting to de-reference a
component of a composite typed column you must place the column name within
parentheses.

(google_user).email

Otherwise the system is thinking that "google_user" is a schema and email
is a column.

David J.​


[GENERAL] Re: Easiest way to compare the results of two queries row by row and column by column

2016-08-22 Thread gilad905
Didn't mean to give the impression I'm 'shouting' :) just wanted to make sure
other users will notice my reply.
But you're right! it does. It even disregards the order of the rows. For
some reason in a previous test I looked to me like it didn't.



--
View this message in context: 
http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5917059.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why insertion throughput can be reduced with an increase of batch size?

2016-08-22 Thread Павел Филонов
My greetings to everybody!

I recently faced with the observation which I can not explain. Why
insertion throughput can be reduced with an increase of batch size?

Brief description of the experiment.

   - PostgreSQL 9.5.4 as server
   - https://github.com/sfackler/rust-postgres library as client driver
   - one relation with two indices (scheme in attach)

Experiment steps:

   - populate DB with 25920 random records
   - start insertion for 60 seconds with one client thread and batch size =
   m
   - record insertions per second (ips) in clients code

Plot median ips from m for m in [2^0, 2^1, ..., 2^15] (in attachment).

On figure with can see that from m = 128 to m = 256 throughput have been
reduced from 13 000 ips to 5000.

I hope someone can help me understand what is the reason for such behavior?

-- 
Best regards
Filonov Pavel


postgres.sql
Description: application/sql

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unique constraint on field inside composite type.

2016-08-22 Thread Silk Parrot
Hi,

    I am trying to model a social login application. The application can 
support multiple login providers. I am thinking of creating a custom type for 
each provider. e.g.

CREATE TYPE system.google_user AS (
   email TEXT
);

CREATE TYPE system.facebook_user AS (
   id TEXT
);

   And having user table like:

CREATE TABLE user (
    uuid UUID PRIMARY KEY DEFAULT public.uuid_generate_v4(),
    google_user system.google_user,
    facebook_user system.facebook_user,
    UNIQUE (google_user.email)
);

However, the above create table query reports syntax error:

ERROR:  syntax error at or near "."
LINE 10:     UNIQUE (google_user.email)

Is there a way to create unique constraint on a field inside composite type?

-- 
Regards
Ryan


Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

2016-08-22 Thread Kevin Grittner
On Mon, Aug 22, 2016 at 3:02 AM, Michael Paquier
 wrote:
> On Mon, Aug 22, 2016 at 4:45 PM, Tatsuki Kadomoto
>  wrote:
>> Thanks for suggestion for upgrade. I know that's the way to go, but it's not
>> so easy due to circumstances on my side.
>
> Well, I guess it depends on how much you care about your data.

Right.  Make sure that whoever is responsible for this decision
knows that until they upgrade they are running with known bugs
which could render the database unusable without warning.  It
should at least be an informed decision so that the decision-maker
can stand behind it and feel as good as possible about
circumstances should that happen.

You might want to keep a copy of the email or memo in which you
point this out, in case anyone's memory gets foggy during such a
crisis.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-22 Thread Andy Colson

On 8/22/2016 2:39 AM, Thomas Güttler wrote:



Am 19.08.2016 um 19:59 schrieb Andy Colson:

On 8/19/2016 2:32 AM, Thomas Güttler wrote:

I want to store logs in a simple table.

Here my columns:

  Primary-key (auto generated)
  timestamp
  host
  service-on-host
  loglevel
  msg
  json (optional)

I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We don't have high traffic. About 200k rows per day.

My heart beats for postgres. We use it since several years.

On the other hand, the sentence "Don't store logs in a DB" is
somewhere in my head.

What do you think?





I played with ElasticSearch a little, mostly because I wanted to use
Kibana which looks really pretty.  I dumped a ton
of logs into it, and made a pretty dashboard ... but in the end it
didn't really help me, and wasn't that useful.  My
problem is, I don't want to have to go look at it.  If something goes
bad, then I want an email alert, at which point
I'm going to go run top, and tail the logs.

Another problem I had with kibana/ES is the syntax to search stuff is
different than I'm used to.  It made it hard to
find stuff in kibana.

Right now, I have a perl script that reads apache logs and fires off
updates into PG to keep stats.  But its an hourly
summary, which the website turns around and queries the stats to show
pretty usage graphs.


You use Perl to read apache logs. Does this work?

Forwarding logs reliably is not easy. Logs are streams, files in unix
are not streams. Sooner or later
the files get rotated. RELP exists, but AFAIK it's usage is not wide
spread:

  https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol

Let's see how to get the logs into postgres 


In the end, PG or ES, all depends on what you want.


Most of my logs start from a http request. I want a unique id per request
in every log line which gets created. This way I can trace the request,
even if its impact spans to several hosts and systems which do not
receive http requests.

Regards,
  Thomas Güttler




I don't read the file.  In apache.conf:

# v, countyia, ip, sess, ts, url, query, status
LogFormat 
"3,%{countyName}e,%a,%{VCSID}C,%{%Y-%m-%dT%H:%M:%S%z}t,\"%U\",\"%q\",%>s" 
csv3


CustomLog "|/usr/local/bin/statSender.pl -r 127.0.0.1" csv3

I think I read somewhere that if you pipe to a script (like above) and 
you dont read fast enough, it could slow apache down.  That's why the 
script above dumps do redis first.  That way I can move processes 
around, restart the database, etc, etc, and not break apache in any way.


The important part of the script:

while (my $x = <>)
{
chomp($x);
next unless ($x);
try_again:
if ($redis)
{
eval {
$redis->lpush($qname, $x);
};
if ($@)
{
$redis = redis_connect();
goto try_again;
}
# just silence this one
eval {
$redis->ltrim($qname, 0, 1000);
};
}
}

Any other machine, or even multiple, then reads from redis and inserts 
into PG.


You can see, in my script, I trim the queue to 1000 items, but that's 
because I'm not as worried about loosing results.  Your setup would 
probably be different.  I also setup redis to not save anything to disk, 
again, because I don't mind if I loose a few hits here or there.  But 
you get the idea.


-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-22 Thread Sameer Kumar
On Mon, 22 Aug 2016, 3:40 p.m. Thomas Güttler, 
wrote:

>
>
> Am 19.08.2016 um 19:59 schrieb Andy Colson:
> > On 8/19/2016 2:32 AM, Thomas Güttler wrote:
> >> I want to store logs in a simple table.
> >>
> >> Here my columns:
> >>
> >>   Primary-key (auto generated)
> >>   timestamp
> >>   host
> >>   service-on-host
> >>   loglevel
> >>   msg
> >>   json (optional)
> >>
> >> I am unsure which DB to choose: Postgres, ElasticSearch or ...?
> >>
> >> We don't have high traffic. About 200k rows per day.
> >>
> >> My heart beats for postgres. We use it since several years.
> >>
> >> On the other hand, the sentence "Don't store logs in a DB" is
> >> somewhere in my head.
> >>
> >> What do you think?
> >>
> >>
> >>
> >
> > I played with ElasticSearch a little, mostly because I wanted to use
> Kibana which looks really pretty.  I dumped a ton
> > of logs into it, and made a pretty dashboard ... but in the end it
> didn't really help me, and wasn't that useful.  My
> > problem is, I don't want to have to go look at it.  If something goes
> bad, then I want an email alert, at which point
> > I'm going to go run top, and tail the logs.
> >
> > Another problem I had with kibana/ES is the syntax to search stuff is
> different than I'm used to.  It made it hard to
> > find stuff in kibana.
> >
> > Right now, I have a perl script that reads apache logs and fires off
> updates into PG to keep stats.  But its an hourly
> > summary, which the website turns around and queries the stats to show
> pretty usage graphs.
>
> You use Perl to read apache logs. Does this work?
>
> Forwarding logs reliably is not easy. Logs are streams, files in unix are
> not streams. Sooner or later
> the files get rotated. RELP exists, but AFAIK it's usage is not wide
> spread:
>
>https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol
>
> Let's see how to get the logs into postgres 
>
> > In the end, PG or ES, all depends on what you want.
>
> Most of my logs start from a http request. I want a unique id per request
> in every log line which gets created. This way I can trace the request,
> even if its impact spans to several hosts and systems which do not receive
> http requests.
>

You may decide not to use Elasticsearch but take a look at other components
of Elastic Stack like logstash and beats. They can be helpful even when you
use Postgres as the end point. Otherwise (IMHO), you would spend a lot of
time writing scripts and jobs to capture and stream logs. If I were you, I
would not want to do that.




> Regards,
>Thomas Güttler
>
>
> --
> Thomas Guettler http://www.thomas-guettler.de/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350

Skype: sameer.ashnik | www.ashnik.com


Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

2016-08-22 Thread Tatsuki Kadomoto
Michael,


Thank you very much.


Regards,

Tatsuki


From: Michael Paquier 
Sent: Monday, August 22, 2016 5:02:40 PM
To: Tatsuki Kadomoto
Cc: John R Pierce; PostgreSQL mailing lists
Subject: Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" 
when VACUUM FULL is executed

On Mon, Aug 22, 2016 at 4:45 PM, Tatsuki Kadomoto
 wrote:
> Thanks for suggestion for upgrade. I know that's the way to go, but it's not
> so easy due to circumstances on my side.

Well, I guess it depends on how much you care about your data.

> Meanwhile, could you tell me what is this "global/pg_filenode.map" for?

It is a relation map file referring a list of OID -> relfilenode. You
can look at the comments on top src/backend/utils/cache/relmapper.c
for more details.
--
Michael


Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

2016-08-22 Thread Michael Paquier
On Mon, Aug 22, 2016 at 4:45 PM, Tatsuki Kadomoto
 wrote:
> Thanks for suggestion for upgrade. I know that's the way to go, but it's not
> so easy due to circumstances on my side.

Well, I guess it depends on how much you care about your data.

> Meanwhile, could you tell me what is this "global/pg_filenode.map" for?

It is a relation map file referring a list of OID -> relfilenode. You
can look at the comments on top src/backend/utils/cache/relmapper.c
for more details.
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" when VACUUM FULL is executed

2016-08-22 Thread Tatsuki Kadomoto
John, Michael,


Thanks for suggestion for upgrade. I know that's the way to go, but it's not so 
easy due to circumstances on my side.


Meanwhile, could you tell me what is this "global/pg_filenode.map" for?


Regards,

Tatsuki


From: pgsql-general-ow...@postgresql.org  
on behalf of Michael Paquier 
Sent: Monday, August 22, 2016 1:34:30 PM
To: John R Pierce
Cc: PostgreSQL mailing lists
Subject: Re: [GENERAL] incorrect checksum detected on "global/pg_filenode.map" 
when VACUUM FULL is executed

On Mon, Aug 22, 2016 at 1:31 PM, John R Pierce  wrote:
> On 8/21/2016 9:13 PM, Tatsuki Kadomoto wrote:
>>
>> Can we point out a specific bug that can lead to this?
>
>
> 9.2.6 fixed several data corruption bugs,
> https://www.postgresql.org/docs/current/static/release-9-2-6.html
>
> 9.2.9 fixed a GiST index corruption problem...
> https://www.postgresql.org/docs/current/static/release-9-2-9.html
>
> I would upgrade to 9.2.18, the latest 9.2 version,
> https://www.postgresql.org/docs/current/static/release-9-2-18.html

Yep, this is mandatory. You are taking a lot of risks here by only using 9.2.4.
--
Michael


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-22 Thread Thomas Güttler



Am 19.08.2016 um 19:59 schrieb Andy Colson:

On 8/19/2016 2:32 AM, Thomas Güttler wrote:

I want to store logs in a simple table.

Here my columns:

  Primary-key (auto generated)
  timestamp
  host
  service-on-host
  loglevel
  msg
  json (optional)

I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We don't have high traffic. About 200k rows per day.

My heart beats for postgres. We use it since several years.

On the other hand, the sentence "Don't store logs in a DB" is
somewhere in my head.

What do you think?





I played with ElasticSearch a little, mostly because I wanted to use Kibana 
which looks really pretty.  I dumped a ton
of logs into it, and made a pretty dashboard ... but in the end it didn't 
really help me, and wasn't that useful.  My
problem is, I don't want to have to go look at it.  If something goes bad, then 
I want an email alert, at which point
I'm going to go run top, and tail the logs.

Another problem I had with kibana/ES is the syntax to search stuff is different 
than I'm used to.  It made it hard to
find stuff in kibana.

Right now, I have a perl script that reads apache logs and fires off updates 
into PG to keep stats.  But its an hourly
summary, which the website turns around and queries the stats to show pretty 
usage graphs.


You use Perl to read apache logs. Does this work?

Forwarding logs reliably is not easy. Logs are streams, files in unix are not 
streams. Sooner or later
the files get rotated. RELP exists, but AFAIK it's usage is not wide spread:

  https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol

Let's see how to get the logs into postgres 


In the end, PG or ES, all depends on what you want.


Most of my logs start from a http request. I want a unique id per request
in every log line which gets created. This way I can trace the request,
even if its impact spans to several hosts and systems which do not receive http 
requests.

Regards,
  Thomas Güttler


--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-22 Thread Thomas Güttler

Thank you Chris for looking at my issue in such detail.

Yes, the parallel feature rocks.

Regards,
  Thomas Güttler

Am 19.08.2016 um 22:40 schrieb Chris Mair:

On 19/08/16 10:57, Thomas Güttler wrote:



What do you think?


I store most of my logs in flat textfiles syslog style, and use grep for adhoc 
querying.

 200K rows/day, thats 1.4 million/week, 6 million/month, pretty soon you're 
talking big tables.

in fact thats several rows/second on a 24/7 basis


There is no need to store them more then 6 weeks in my current use case.



Hi,

to me this kind of data looks like something Postgres can handle with ease.

We're talking about 8.4M rows here.

Coincidentally, I was trying out the new parallel query feature in the
9.6 beta just now and decided to use your numbers as a test case :)

I can create 8.4M records having a timestamp and a random ~ 250 character string
in ~ 31 seconds:

pg96=# select now() + (sec / 20.0 * 86400.0 || ' seconds')::interval as ts,
pg96-#repeat(random()::text, 15) as msg
pg96-# into t1
pg96-# from generate_series(1, 6 * 7 * 20) as sec;
SELECT 840
Time: 30858.274 ms

Table size is 2.4 GB.

This gives about 6 weeks. A query to scan the whole thing on the narrow column
takes ~ 400 msec, like this:

pg96=# select min(ts), max(ts) from t1;
  min  |  max
---+---
 2016-08-19 20:17:24.921333+00 | 2016-09-30 20:17:24.489333+00
(1 row)

Time: 409.468 ms

Even running an unanchored regular expression (!) on the wider column is doable:

pg96=# select count(*) from t1 where msg ~ '12345';
 count
---
   955
(1 row)

Time: 3146.838 ms

If you have some filter, not everything needs to be regexped and this gets 
pretty fast:

pg96=# select count(*) from t1 where ts between '2016-08-25' and '2016-08-26' 
and msg ~ '12345';
 count
---
24
(1 row)

Time: 391.577 ms

All this is without indices. Your data is more structured than my test, so 
undoubtly you will
get some gain from indices...

Here is something more analytical - basically same as the count(*) above:

pg96=# select ts::date, count(*) from t1 where msg ~ '12345' group by ts::date 
order by ts::date;
 ts | count
+---
 2016-08-19 |26
 2016-08-20 |28
 [...]
 2016-09-28 |21
 2016-09-29 |33
(42 rows)

Time: 3157.010 ms

Note, however, that I'm using 9.6 beta with the parallel query feature: the 
sequential scans with the regexp is
run in parallel on 6 workers... this gives me a speed-up of a factor 4-5 
(machine has 8 logical CPUs) and
the whole table fits in cache. For a use case as this, the parallel query 
feature in 9.6 is so good it's almost
like cheating ;)

Bye,
Chris.







--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general