Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-23 Thread Terry Schmitt
Certainly Postgres is capable of handling this volume just fine. Throw in
some partition rotation handling and you have a solution.
If you want to play with something different, check out Graylog, which is
backed by Elasticsearch. A bit more work to set up than a single Postgres
table, but it has ben a success for us storing, syslog, app logs, and
Postgres logs from several hundred network devices, Windows and Linux
servers. Rotation is handled based on your requirements and drilling down
to the details is trivial. Alerting is baked in as well. It could well be
overkill for your needs, but I don't know what your environment looks like.

T

On Mon, Aug 22, 2016 at 7:03 AM, Andy Colson  wrote:

> On 8/22/2016 2:39 AM, Thomas Güttler wrote:
>
>>
>>
>> Am 19.08.2016 um 19:59 schrieb Andy Colson:
>>
>>> On 8/19/2016 2:32 AM, Thomas Güttler wrote:
>>>
 I want to store logs in a simple table.

 Here my columns:

   Primary-key (auto generated)
   timestamp
   host
   service-on-host
   loglevel
   msg
   json (optional)

 I am unsure which DB to choose: Postgres, ElasticSearch or ...?

 We don't have high traffic. About 200k rows per day.

 My heart beats for postgres. We use it since several years.

 On the other hand, the sentence "Don't store logs in a DB" is
 somewhere in my head.

 What do you think?




>>> I played with ElasticSearch a little, mostly because I wanted to use
>>> Kibana which looks really pretty.  I dumped a ton
>>> of logs into it, and made a pretty dashboard ... but in the end it
>>> didn't really help me, and wasn't that useful.  My
>>> problem is, I don't want to have to go look at it.  If something goes
>>> bad, then I want an email alert, at which point
>>> I'm going to go run top, and tail the logs.
>>>
>>> Another problem I had with kibana/ES is the syntax to search stuff is
>>> different than I'm used to.  It made it hard to
>>> find stuff in kibana.
>>>
>>> Right now, I have a perl script that reads apache logs and fires off
>>> updates into PG to keep stats.  But its an hourly
>>> summary, which the website turns around and queries the stats to show
>>> pretty usage graphs.
>>>
>>
>> You use Perl to read apache logs. Does this work?
>>
>> Forwarding logs reliably is not easy. Logs are streams, files in unix
>> are not streams. Sooner or later
>> the files get rotated. RELP exists, but AFAIK it's usage is not wide
>> spread:
>>
>>   https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol
>>
>> Let's see how to get the logs into postgres 
>>
>> In the end, PG or ES, all depends on what you want.
>>>
>>
>> Most of my logs start from a http request. I want a unique id per request
>> in every log line which gets created. This way I can trace the request,
>> even if its impact spans to several hosts and systems which do not
>> receive http requests.
>>
>> Regards,
>>   Thomas Güttler
>>
>>
>>
> I don't read the file.  In apache.conf:
>
> # v, countyia, ip, sess, ts, url, query, status
> LogFormat 
> "3,%{countyName}e,%a,%{VCSID}C,%{%Y-%m-%dT%H:%M:%S%z}t,\"%U\",\"%q\",%>s"
> csv3
>
> CustomLog "|/usr/local/bin/statSender.pl -r 127.0.0.1" csv3
>
> I think I read somewhere that if you pipe to a script (like above) and you
> dont read fast enough, it could slow apache down.  That's why the script
> above dumps do redis first.  That way I can move processes around, restart
> the database, etc, etc, and not break apache in any way.
>
> The important part of the script:
>
> while (my $x = <>)
> {
> chomp($x);
> next unless ($x);
> try_again:
> if ($redis)
> {
> eval {
> $redis->lpush($qname, $x);
> };
> if ($@)
> {
> $redis = redis_connect();
> goto try_again;
> }
> # just silence this one
> eval {
> $redis->ltrim($qname, 0, 1000);
> };
> }
> }
>
> Any other machine, or even multiple, then reads from redis and inserts
> into PG.
>
> You can see, in my script, I trim the queue to 1000 items, but that's
> because I'm not as worried about loosing results.  Your setup would
> probably be different.  I also setup redis to not save anything to disk,
> again, because I don't mind if I loose a few hits here or there.  But you
> get the idea.
>
> -Andy
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-22 Thread Andy Colson

On 8/22/2016 2:39 AM, Thomas Güttler wrote:



Am 19.08.2016 um 19:59 schrieb Andy Colson:

On 8/19/2016 2:32 AM, Thomas Güttler wrote:

I want to store logs in a simple table.

Here my columns:

  Primary-key (auto generated)
  timestamp
  host
  service-on-host
  loglevel
  msg
  json (optional)

I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We don't have high traffic. About 200k rows per day.

My heart beats for postgres. We use it since several years.

On the other hand, the sentence "Don't store logs in a DB" is
somewhere in my head.

What do you think?





I played with ElasticSearch a little, mostly because I wanted to use
Kibana which looks really pretty.  I dumped a ton
of logs into it, and made a pretty dashboard ... but in the end it
didn't really help me, and wasn't that useful.  My
problem is, I don't want to have to go look at it.  If something goes
bad, then I want an email alert, at which point
I'm going to go run top, and tail the logs.

Another problem I had with kibana/ES is the syntax to search stuff is
different than I'm used to.  It made it hard to
find stuff in kibana.

Right now, I have a perl script that reads apache logs and fires off
updates into PG to keep stats.  But its an hourly
summary, which the website turns around and queries the stats to show
pretty usage graphs.


You use Perl to read apache logs. Does this work?

Forwarding logs reliably is not easy. Logs are streams, files in unix
are not streams. Sooner or later
the files get rotated. RELP exists, but AFAIK it's usage is not wide
spread:

  https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol

Let's see how to get the logs into postgres 


In the end, PG or ES, all depends on what you want.


Most of my logs start from a http request. I want a unique id per request
in every log line which gets created. This way I can trace the request,
even if its impact spans to several hosts and systems which do not
receive http requests.

Regards,
  Thomas Güttler




I don't read the file.  In apache.conf:

# v, countyia, ip, sess, ts, url, query, status
LogFormat 
"3,%{countyName}e,%a,%{VCSID}C,%{%Y-%m-%dT%H:%M:%S%z}t,\"%U\",\"%q\",%>s" 
csv3


CustomLog "|/usr/local/bin/statSender.pl -r 127.0.0.1" csv3

I think I read somewhere that if you pipe to a script (like above) and 
you dont read fast enough, it could slow apache down.  That's why the 
script above dumps do redis first.  That way I can move processes 
around, restart the database, etc, etc, and not break apache in any way.


The important part of the script:

while (my $x = <>)
{
chomp($x);
next unless ($x);
try_again:
if ($redis)
{
eval {
$redis->lpush($qname, $x);
};
if ($@)
{
$redis = redis_connect();
goto try_again;
}
# just silence this one
eval {
$redis->ltrim($qname, 0, 1000);
};
}
}

Any other machine, or even multiple, then reads from redis and inserts 
into PG.


You can see, in my script, I trim the queue to 1000 items, but that's 
because I'm not as worried about loosing results.  Your setup would 
probably be different.  I also setup redis to not save anything to disk, 
again, because I don't mind if I loose a few hits here or there.  But 
you get the idea.


-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-22 Thread Sameer Kumar
On Mon, 22 Aug 2016, 3:40 p.m. Thomas Güttler, 
wrote:

>
>
> Am 19.08.2016 um 19:59 schrieb Andy Colson:
> > On 8/19/2016 2:32 AM, Thomas Güttler wrote:
> >> I want to store logs in a simple table.
> >>
> >> Here my columns:
> >>
> >>   Primary-key (auto generated)
> >>   timestamp
> >>   host
> >>   service-on-host
> >>   loglevel
> >>   msg
> >>   json (optional)
> >>
> >> I am unsure which DB to choose: Postgres, ElasticSearch or ...?
> >>
> >> We don't have high traffic. About 200k rows per day.
> >>
> >> My heart beats for postgres. We use it since several years.
> >>
> >> On the other hand, the sentence "Don't store logs in a DB" is
> >> somewhere in my head.
> >>
> >> What do you think?
> >>
> >>
> >>
> >
> > I played with ElasticSearch a little, mostly because I wanted to use
> Kibana which looks really pretty.  I dumped a ton
> > of logs into it, and made a pretty dashboard ... but in the end it
> didn't really help me, and wasn't that useful.  My
> > problem is, I don't want to have to go look at it.  If something goes
> bad, then I want an email alert, at which point
> > I'm going to go run top, and tail the logs.
> >
> > Another problem I had with kibana/ES is the syntax to search stuff is
> different than I'm used to.  It made it hard to
> > find stuff in kibana.
> >
> > Right now, I have a perl script that reads apache logs and fires off
> updates into PG to keep stats.  But its an hourly
> > summary, which the website turns around and queries the stats to show
> pretty usage graphs.
>
> You use Perl to read apache logs. Does this work?
>
> Forwarding logs reliably is not easy. Logs are streams, files in unix are
> not streams. Sooner or later
> the files get rotated. RELP exists, but AFAIK it's usage is not wide
> spread:
>
>https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol
>
> Let's see how to get the logs into postgres 
>
> > In the end, PG or ES, all depends on what you want.
>
> Most of my logs start from a http request. I want a unique id per request
> in every log line which gets created. This way I can trace the request,
> even if its impact spans to several hosts and systems which do not receive
> http requests.
>

You may decide not to use Elasticsearch but take a look at other components
of Elastic Stack like logstash and beats. They can be helpful even when you
use Postgres as the end point. Otherwise (IMHO), you would spend a lot of
time writing scripts and jobs to capture and stream logs. If I were you, I
would not want to do that.




> Regards,
>Thomas Güttler
>
>
> --
> Thomas Guettler http://www.thomas-guettler.de/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350

Skype: sameer.ashnik | www.ashnik.com


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-22 Thread Thomas Güttler



Am 19.08.2016 um 19:59 schrieb Andy Colson:

On 8/19/2016 2:32 AM, Thomas Güttler wrote:

I want to store logs in a simple table.

Here my columns:

  Primary-key (auto generated)
  timestamp
  host
  service-on-host
  loglevel
  msg
  json (optional)

I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We don't have high traffic. About 200k rows per day.

My heart beats for postgres. We use it since several years.

On the other hand, the sentence "Don't store logs in a DB" is
somewhere in my head.

What do you think?





I played with ElasticSearch a little, mostly because I wanted to use Kibana 
which looks really pretty.  I dumped a ton
of logs into it, and made a pretty dashboard ... but in the end it didn't 
really help me, and wasn't that useful.  My
problem is, I don't want to have to go look at it.  If something goes bad, then 
I want an email alert, at which point
I'm going to go run top, and tail the logs.

Another problem I had with kibana/ES is the syntax to search stuff is different 
than I'm used to.  It made it hard to
find stuff in kibana.

Right now, I have a perl script that reads apache logs and fires off updates 
into PG to keep stats.  But its an hourly
summary, which the website turns around and queries the stats to show pretty 
usage graphs.


You use Perl to read apache logs. Does this work?

Forwarding logs reliably is not easy. Logs are streams, files in unix are not 
streams. Sooner or later
the files get rotated. RELP exists, but AFAIK it's usage is not wide spread:

  https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol

Let's see how to get the logs into postgres 


In the end, PG or ES, all depends on what you want.


Most of my logs start from a http request. I want a unique id per request
in every log line which gets created. This way I can trace the request,
even if its impact spans to several hosts and systems which do not receive http 
requests.

Regards,
  Thomas Güttler


--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-22 Thread Thomas Güttler

Thank you Chris for looking at my issue in such detail.

Yes, the parallel feature rocks.

Regards,
  Thomas Güttler

Am 19.08.2016 um 22:40 schrieb Chris Mair:

On 19/08/16 10:57, Thomas Güttler wrote:



What do you think?


I store most of my logs in flat textfiles syslog style, and use grep for adhoc 
querying.

 200K rows/day, thats 1.4 million/week, 6 million/month, pretty soon you're 
talking big tables.

in fact thats several rows/second on a 24/7 basis


There is no need to store them more then 6 weeks in my current use case.



Hi,

to me this kind of data looks like something Postgres can handle with ease.

We're talking about 8.4M rows here.

Coincidentally, I was trying out the new parallel query feature in the
9.6 beta just now and decided to use your numbers as a test case :)

I can create 8.4M records having a timestamp and a random ~ 250 character string
in ~ 31 seconds:

pg96=# select now() + (sec / 20.0 * 86400.0 || ' seconds')::interval as ts,
pg96-#repeat(random()::text, 15) as msg
pg96-# into t1
pg96-# from generate_series(1, 6 * 7 * 20) as sec;
SELECT 840
Time: 30858.274 ms

Table size is 2.4 GB.

This gives about 6 weeks. A query to scan the whole thing on the narrow column
takes ~ 400 msec, like this:

pg96=# select min(ts), max(ts) from t1;
  min  |  max
---+---
 2016-08-19 20:17:24.921333+00 | 2016-09-30 20:17:24.489333+00
(1 row)

Time: 409.468 ms

Even running an unanchored regular expression (!) on the wider column is doable:

pg96=# select count(*) from t1 where msg ~ '12345';
 count
---
   955
(1 row)

Time: 3146.838 ms

If you have some filter, not everything needs to be regexped and this gets 
pretty fast:

pg96=# select count(*) from t1 where ts between '2016-08-25' and '2016-08-26' 
and msg ~ '12345';
 count
---
24
(1 row)

Time: 391.577 ms

All this is without indices. Your data is more structured than my test, so 
undoubtly you will
get some gain from indices...

Here is something more analytical - basically same as the count(*) above:

pg96=# select ts::date, count(*) from t1 where msg ~ '12345' group by ts::date 
order by ts::date;
 ts | count
+---
 2016-08-19 |26
 2016-08-20 |28
 [...]
 2016-09-28 |21
 2016-09-29 |33
(42 rows)

Time: 3157.010 ms

Note, however, that I'm using 9.6 beta with the parallel query feature: the 
sequential scans with the regexp is
run in parallel on 6 workers... this gives me a speed-up of a factor 4-5 
(machine has 8 logical CPUs) and
the whole table fits in cache. For a use case as this, the parallel query 
feature in 9.6 is so good it's almost
like cheating ;)

Bye,
Chris.







--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Sameer Kumar
On Sat, 20 Aug 2016, 2:00 a.m. Andy Colson,  wrote:

> On 8/19/2016 2:32 AM, Thomas Güttler wrote:
> > I want to store logs in a simple table.
> >
> > Here my columns:
> >
> >   Primary-key (auto generated)
> >   timestamp
> >   host
> >   service-on-host
> >   loglevel
> >   msg
> >   json (optional)
> >
> > I am unsure which DB to choose: Postgres, ElasticSearch or ...?
> >
> > We don't have high traffic. About 200k rows per day.
> >
> > My heart beats for postgres. We use it since several years.
> >
> > On the other hand, the sentence "Don't store logs in a DB" is
> > somewhere in my head.
> >
> > What do you think?
> >
> >
> >
>
> I played with ElasticSearch a little, mostly because I wanted to use
> Kibana which looks really pretty.  I dumped a ton of logs into it, and
> made a pretty dashboard ... but in the end it didn't really help me, and
> wasn't that useful.  My problem is, I don't want to have to go look at
> it.  If something goes bad, then I want an email alert, at which point
> I'm going to go run top, and tail the logs.
>

There are tools from Elastic Stack which could have helped you achieve
email alerts and gather top or tailing of logfile


> Another problem I had with kibana/ES is the syntax to search stuff is
> different than I'm used to.  It made it hard to find stuff in kibana.
>
> Right now, I have a perl script that reads apache logs and fires off
> updates into PG to keep stats.  But its an hourly summary, which the
> website turns around and queries the stats to show pretty usage graphs.
>
> In the end, PG or ES, all depends on what you want.
>
> -Andy
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350

Skype: sameer.ashnik | www.ashnik.com


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Chris Mair

On 19/08/16 10:57, Thomas Güttler wrote:



What do you think?


I store most of my logs in flat textfiles syslog style, and use grep for adhoc 
querying.

 200K rows/day, thats 1.4 million/week, 6 million/month, pretty soon you're 
talking big tables.

in fact thats several rows/second on a 24/7 basis


There is no need to store them more then 6 weeks in my current use case.



Hi,

to me this kind of data looks like something Postgres can handle with ease.

We're talking about 8.4M rows here.

Coincidentally, I was trying out the new parallel query feature in the
9.6 beta just now and decided to use your numbers as a test case :)

I can create 8.4M records having a timestamp and a random ~ 250 character string
in ~ 31 seconds:

pg96=# select now() + (sec / 20.0 * 86400.0 || ' seconds')::interval as ts,
pg96-#repeat(random()::text, 15) as msg
pg96-# into t1
pg96-# from generate_series(1, 6 * 7 * 20) as sec;
SELECT 840
Time: 30858.274 ms

Table size is 2.4 GB.

This gives about 6 weeks. A query to scan the whole thing on the narrow column
takes ~ 400 msec, like this:

pg96=# select min(ts), max(ts) from t1;
  min  |  max
---+---
 2016-08-19 20:17:24.921333+00 | 2016-09-30 20:17:24.489333+00
(1 row)

Time: 409.468 ms

Even running an unanchored regular expression (!) on the wider column is doable:

pg96=# select count(*) from t1 where msg ~ '12345';
 count
---
   955
(1 row)

Time: 3146.838 ms

If you have some filter, not everything needs to be regexped and this gets 
pretty fast:

pg96=# select count(*) from t1 where ts between '2016-08-25' and '2016-08-26' 
and msg ~ '12345';
 count
---
24
(1 row)

Time: 391.577 ms

All this is without indices. Your data is more structured than my test, so 
undoubtly you will
get some gain from indices...

Here is something more analytical - basically same as the count(*) above:

pg96=# select ts::date, count(*) from t1 where msg ~ '12345' group by ts::date 
order by ts::date;
 ts | count
+---
 2016-08-19 |26
 2016-08-20 |28
 [...]
 2016-09-28 |21
 2016-09-29 |33
(42 rows)

Time: 3157.010 ms

Note, however, that I'm using 9.6 beta with the parallel query feature: the 
sequential scans with the regexp is
run in parallel on 6 workers... this gives me a speed-up of a factor 4-5 
(machine has 8 logical CPUs) and
the whole table fits in cache. For a use case as this, the parallel query 
feature in 9.6 is so good it's almost
like cheating ;)

Bye,
Chris.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Andy Colson

On 8/19/2016 2:32 AM, Thomas Güttler wrote:

I want to store logs in a simple table.

Here my columns:

  Primary-key (auto generated)
  timestamp
  host
  service-on-host
  loglevel
  msg
  json (optional)

I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We don't have high traffic. About 200k rows per day.

My heart beats for postgres. We use it since several years.

On the other hand, the sentence "Don't store logs in a DB" is
somewhere in my head.

What do you think?





I played with ElasticSearch a little, mostly because I wanted to use 
Kibana which looks really pretty.  I dumped a ton of logs into it, and 
made a pretty dashboard ... but in the end it didn't really help me, and 
wasn't that useful.  My problem is, I don't want to have to go look at 
it.  If something goes bad, then I want an email alert, at which point 
I'm going to go run top, and tail the logs.


Another problem I had with kibana/ES is the syntax to search stuff is 
different than I'm used to.  It made it hard to find stuff in kibana.


Right now, I have a perl script that reads apache logs and fires off 
updates into PG to keep stats.  But its an hourly summary, which the 
website turns around and queries the stats to show pretty usage graphs.


In the end, PG or ES, all depends on what you want.

-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Merlin Moncure
On Fri, Aug 19, 2016 at 2:32 AM, Thomas Güttler
 wrote:
> I want to store logs in a simple table.
>
> Here my columns:
>
>   Primary-key (auto generated)
>   timestamp
>   host
>   service-on-host
>   loglevel
>   msg
>   json (optional)
>
> I am unsure which DB to choose: Postgres, ElasticSearch or ...?

We use SOLR (which is similar to ElasticSearch) here for json document
retrieval.  Agreeing to do this was one of the biggest mistakes in my
professional career.  This choice was somewhat forced because at the
time jsonb was not baked.  In my opinion, jsonb outclasses these types
of services particularly if you are already invested in postgres.  The
specifics of your requirements also plays into this decision
naturally.  The bottom line though is that these kinds of systems are
not nearly as fast or robust as they claim to be particularly if you
wander off the use cases they are engineered for (like needing
transactions or joins for example).  They also tend to be fairly
opaque in how they operate and the supporting tooling is laughable
relative to established database systems.

Postgres OTOH can be made to do pretty much anything given sufficient
expertise and a progressive attitude.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread John R Pierce

On 8/19/2016 3:44 AM, Andreas Kretschmer wrote:

So, in your case, consider partitioning, maybe per month. So you can
also avoid mess with table and index bloat.


with his 6 week retention, i'd partition by week.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Thomas Güttler



Am 19.08.2016 um 12:44 schrieb Andreas Kretschmer:

Thomas Güttler  wrote:


How will you be using the logs? What kind of queries? What kind of searches?
Correlating events and logs from various sources could be really easy with 
joins, count and summary operations.


Wishes raise with possibilities. First I want to do simple queries about
hosts and timestamps. Then some simple substring matches.


for append-only tables like this consider 9.5 and BRIN-Indexes for
timestamp-searches. But if you deletes after N weeks BRIN shouldn't work
properly because of vacuum and re-use of space within the table.
Do you know BRIN?

So, in your case, consider partitioning, maybe per month. So you can
also avoid mess with table and index bloat.


Thank you very much for these hints. I did not know BRIN before.



Greetings from Dresden to Chemnitz (is this still valid?)


Yes, I am in Chemnitz/Germany. Everything (kids, wife, friends, sports, job) is 
fine.

I hope the same with you?

Regards,
  Thomas Güttler



--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Francisco Olarte
On Fri, Aug 19, 2016 at 12:44 PM, Andreas Kretschmer
 wrote:
> for append-only tables like this consider 9.5 and BRIN-Indexes for
> timestamp-searches. But if you deletes after N weeks BRIN shouldn't work
> properly because of vacuum and re-use of space within the table.
> Do you know BRIN?
>
> So, in your case, consider partitioning, maybe per month. So you can
> also avoid mess with table and index bloat.

If done properly he can use both. For 6 weeks I would use seven
partition, current+6 previous, drop old partition weekly, so
effectively they become append only and he can use BRIN too.

Even better, if he normally inserts in batches ( it happens to me with
some log-like data, I rotate the file and insert all rotated data
periodically ) he can use a staging table ( 1 master, and inheriting
from it seven constrained week partition plus one unconstrained
staging partition). Insertions go into staging and are moved with a
small delay to the corresponding partition, using and ordered select
so they go in perfect order into their final resting place and it can
be vacuumed just after that ( if they are log lines and the maximum
delay is X you just move every row older than that from staging to the
partition with whatever period is best). Staging partition is normally
small and cached and can be processed quite fast ( with 200k/day an
hourly movement will leave staging with less than about 10k rows if
distribution is somehow uniform ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Andreas Kretschmer
Thomas Güttler  wrote:

>> How will you be using the logs? What kind of queries? What kind of searches?
>> Correlating events and logs from various sources could be really easy with 
>> joins, count and summary operations.
>
> Wishes raise with possibilities. First I want to do simple queries about 
> hosts and timestamps. Then some simple substring matches.

for append-only tables like this consider 9.5 and BRIN-Indexes for
timestamp-searches. But if you deletes after N weeks BRIN shouldn't work
properly because of vacuum and re-use of space within the table.
Do you know BRIN?

So, in your case, consider partitioning, maybe per month. So you can
also avoid mess with table and index bloat.


Greetings from Dresden to Chemnitz (is this still valid?)



Regards, Andreas Kretschmer
-- 
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Rafal Pietrak


W dniu 19.08.2016 o 10:57, Thomas Güttler pisze:
> 
> 
> Am 19.08.2016 um 09:42 schrieb John R Pierce:
[-]
>> in fact thats several rows/second on a 24/7 basis
> 
> There is no need to store them more then 6 weeks in my current use case.
> 
> I think indexing in postgres is much faster than grep.

Not so much IMHE(xperience).

1. if you know what you are looking for: grep's the best

2. if you dont  grep (or more/less/vi) is the best.

only when you routinely update/join/etc, RDBMS really shines.But that's
not what you normally do with your logs. Right?

but then again. there is an additional benefit of "having everyting
under one hood" - so standarising on a single repository (like rdbms)
has its benefits.

regards,

-R


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Thomas Güttler



Am 19.08.2016 um 11:21 schrieb Sameer Kumar:



On Fri, Aug 19, 2016 at 4:58 PM Thomas Güttler > wrote:



Am 19.08.2016 um 09:42 schrieb John R Pierce:
> On 8/19/2016 12:32 AM, Thomas Güttler wrote:
>> What do you think?
>
> I store most of my logs in flat textfiles syslog style, and use grep for 
adhoc querying.
>
>  200K rows/day, thats 1.4 million/week, 6 million/month, pretty soon 
you're talking big tables.
>
> in fact thats several rows/second on a 24/7 basis

There is no need to store them more then 6 weeks in my current use case.

I think indexing in postgres is much faster than grep.

And queries including json data are not possible with grep (or at least 
very hard to type)

My concern is which DB (or indexing) to use ...


How will you be using the logs? What kind of queries? What kind of searches?
Correlating events and logs from various sources could be really easy with 
joins, count and summary operations.


Wishes raise with possibilities. First I want to do simple queries about hosts and timestamps. Then some simple 
substring matches.


Up to now to structured logging (the json column) gets created. But if it gets 
filled, we will find a use case where
we use ssh+grep up to now.

Up to now we need no stemming and language support.


The kind of volume you are anticipating should be fine with Postgres but before 
you really decide which one, you need to
figure out what would you want to do with this data once it is in Postgres.


The goal is a bit fuzzy up to now: Better overview.

Thank you for your feedback ("The kind of volume you are anticipating should be fine 
with Postgres").

I guess I will use postgres, especial since Django ORM supports JSON in 
Postgres:

  https://docs.djangoproject.com/en/1.10/ref/contrib/postgres/fields/#jsonfield


Regards,
  Thomas Güttler


--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Sameer Kumar
On Fri, Aug 19, 2016 at 4:58 PM Thomas Güttler 
wrote:

>
>
> Am 19.08.2016 um 09:42 schrieb John R Pierce:
> > On 8/19/2016 12:32 AM, Thomas Güttler wrote:
> >> What do you think?
> >
> > I store most of my logs in flat textfiles syslog style, and use grep for
> adhoc querying.
> >
> >  200K rows/day, thats 1.4 million/week, 6 million/month, pretty soon
> you're talking big tables.
> >
> > in fact thats several rows/second on a 24/7 basis
>
> There is no need to store them more then 6 weeks in my current use case.
>
> I think indexing in postgres is much faster than grep.
>
> And queries including json data are not possible with grep (or at least
> very hard to type)
>
> My concern is which DB (or indexing) to use ...
>

How will you be using the logs? What kind of queries? What kind of
searches?
Correlating events and logs from various sources could be really easy with
joins, count and summary operations.

The kind of volume you are anticipating should be fine with Postgres but
before you really decide which one, you need to figure out what would you
want to do with this data once it is in Postgres.


> Regards,
>   Thomas
>
>
> --
> Thomas Guettler http://www.thomas-guettler.de/
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
-- 
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350

Skype: sameer.ashnik | www.ashnik.com


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread Thomas Güttler



Am 19.08.2016 um 09:42 schrieb John R Pierce:

On 8/19/2016 12:32 AM, Thomas Güttler wrote:

What do you think?


I store most of my logs in flat textfiles syslog style, and use grep for adhoc 
querying.

 200K rows/day, thats 1.4 million/week, 6 million/month, pretty soon you're 
talking big tables.

in fact thats several rows/second on a 24/7 basis


There is no need to store them more then 6 weeks in my current use case.

I think indexing in postgres is much faster than grep.

And queries including json data are not possible with grep (or at least very 
hard to type)

My concern is which DB (or indexing) to use ...

Regards,
 Thomas


--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG vs ElasticSearch for Logs

2016-08-19 Thread John R Pierce

On 8/19/2016 12:32 AM, Thomas Güttler wrote:
What do you think? 


I store most of my logs in flat textfiles syslog style, and use grep for 
adhoc querying.


 200K rows/day, thats 1.4 million/week, 6 million/month, pretty soon 
you're talking big tables.


in fact thats several rows/second on a 24/7 basis


--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general