[PERFORM] Obtaining the exact size of the database.

2010-06-20 Thread venu madhav
Hi All,
  I am using Postgres 8.1.9 for my application. My application also has
a clean up module which cleans up specified percentage of total database
size at regular intervals. Now the problem is I use *pg_database_size* to
obtain the size of the database. After deleting the records, we run *Vacuum
Analyze* to reorder the indexes. The problem here is even though some
records are cleared, it still shows the original DB Size. Is there any way
to find out the actual DB Size or it would be more useful, if I can get the
size of each table.
 I can't run *Vacuum Full* because the application should be run 24*7
without downtime.

Can someone please help me in solving this.

Please let me know if you need any clarifications.

Thank you,
Venu.


Re: [PERFORM] Autovacuum in postgres.

2010-06-02 Thread venu madhav
Thanks for the reply..
   I am using postgres 8.01 and since it runs on a client box, I
can't upgrade it. I've set the auto vacuum nap time to 3600 seconds.

On Thu, May 27, 2010 at 8:03 PM, Bruce Momjian  wrote:

> venu madhav wrote:
> > Hi All,
> >In my application we are using postgres which runs on an embedded
> > box. I have configured autovacuum to run once for every one hour. It has
> 5
> > different databases in it. When I saw the log messages, I found that it
> is
> > running autovacuum on one database every hour. As a result, on my
> database
> > autovacuum is run once in 5 hours. Is there any way to make it run it
> every
> > hour.
>
> What settings did you change to make it run every hour?  Also, it will
> only vacuum tables that need vacuuming.  What version of Postgres are
> you using?
>
> --
>  Bruce Momjian  http://momjian.us
>  EnterpriseDB http://enterprisedb.com
>


Re: [PERFORM] Autovacuum in postgres.

2010-05-27 Thread venu madhav
One more question " Is is expected ?"

On Fri, May 21, 2010 at 3:08 PM, venu madhav wrote:

> Hi All,
>In my application we are using postgres which runs on an embedded
> box. I have configured autovacuum to run once for every one hour. It has 5
> different databases in it. When I saw the log messages, I found that it is
> running autovacuum on one database every hour. As a result, on my database
> autovacuum is run once in 5 hours. Is there any way to make it run it every
> hour.
>
>
> Thank you,
> Venu
>


[PERFORM] Autovacuum in postgres.

2010-05-27 Thread venu madhav
Hi All,
   In my application we are using postgres which runs on an embedded
box. I have configured autovacuum to run once for every one hour. It has 5
different databases in it. When I saw the log messages, I found that it is
running autovacuum on one database every hour. As a result, on my database
autovacuum is run once in 5 hours. Is there any way to make it run it every
hour.


Thank you,
Venu


Re: [PERFORM] Performance issues when the number of records are around 10 Million

2010-05-14 Thread venu madhav
On Wed, May 12, 2010 at 7:26 PM, Kevin Grittner  wrote:

> venu madhav  wrote:
>
> >> > If the records are more in the interval,
> >>
> >> How do you know that before you run your query?
> >>
> > I calculate the count first.
>
> This and other comments suggest that the data is totally static
> while this application is running.  Is that correct?
>
[Venu] No, the data gets added when the application is running. As I've
mentioned before it could be as faster as 100-400 records per second. And it
is an important application which will be running 24/7.

>
> > If generate all the pages at once, to retrieve all the 10 M
> > records at once, it would take much longer time
>
> Are you sure of that?  It seems to me that it's going to read all
> ten million rows once for the count and again for the offset.  It
> might actually be faster to pass them just once and build the pages.
>
[Venu] Even if the retrieval is faster, the client which is viewing the
database and the server where the data gets logged can be any where on the
globe. So, it is not feasible to get all the 1 or 10 M records at once from
the server to client.


>
> Also, you didn't address the issue of storing enough information on
> the page to read off either edge in the desired sequence with just a
> LIMIT and no offset.  "Last page" or "page up" would need to reverse
> the direction on the ORDER BY.  This would be very fast if you have
> appropriate indexes.  Your current technique can never be made very
> fast.
>
[Venu] I actually didn't understand what did you mean when you said "storing
enough information on the page to read off either edge in the desired
sequence with just a
LIMIT and no offset". What kind of information can we store to improve the
performance.  Reversing the order by is one thing, I am trying to figure out
how fast it is. Thanks a lot for this suggestion.

Thank you,
Venu.

>
> -Kevin
>


Re: [PERFORM] Performance issues when the number of records are around 10 Million

2010-05-14 Thread venu madhav
On Wed, May 12, 2010 at 5:25 PM, Kevin Grittner  wrote:

> venu madhav  wrote:
>
> >>> AND e.timestamp >= '1270449180'
> >>> AND e.timestamp < '1273473180'
> >>> ORDER BY.
> >>> e.cid DESC,
> >>> e.cid DESC
> >>> limit 21
> >>> offset 10539780
>
> > The second column acts as a secondary key for sorting if the
> > primary sorting key is a different column. For this query both of
> > them are same.
>
> Any chance you could just leave the second one off in that case?
>
[Venu] Yes, that can be ignored. But am not sure that removing it would
reduce the time drastically.

>
> > This query is part of an application which allows user to select
> > time ranges and retrieve the data in that interval. Hence the time
> > stamp.
>
> Which, of course, is going to affect the number of rows.  Which
> leaves me wondering how you know that once you select and sequence
> the result set you need to read past and ignore exactly 10539780
> rows to get to the last page.
>
[Venu]For Ex:  My database has 10539793 records. My application first
calculates the count of number of records in that interval. And then based
on user request to display 10/20/30/40 records in one page, it calculates
how many records to be displayed when the last link is clicked.

>
> > To have it in some particular order we're doing order by.
>
> Which will affect which rows are at any particular offset.
>
[Venu]Yes, by default it has the primary key for order by.

>
> > If the records are more in the interval,
>
> How do you know that before you run your query?
>
 [Venu] I calculate the count first.

>
> > we display in sets of 20/30 etc. The user also has the option to
> > browse through any of those records hence the limit and offset.
>
> Have you considered alternative techniques for paging?  You might
> use values at the edges of the page to run a small query (limit, no
> offset) when they page.  You might generate all the pages on the
> first pass and cache them for a while.
>
> [Venu] If generate all the pages at once, to retrieve all the 10 M records
at once, it would take much longer time and since the request from the
browser, there is a chance of browser getting timed out.

> > When the user asks for the last set of 20 records, this query gets
> > executed.
>
> The DESC on the ORDER BY makes it look like you're trying to use the
> ORDER BY to get to the end, but then your offset tells PostgreSQL to
> skip the 10.5 million result rows with the highest keys.  Is the
> "last page" the one with the highest or lowest values for cid?
>
> [Venu] The last page contains the lowest values of cid. By default we get
the records in the decreasing order of cid and then get the last 10/20.

Thank you,
Venu.

> -Kevin
>
>
>


Re: [PERFORM] Performance issues when the number of records are around 10 Million

2010-05-14 Thread venu madhav
On Wed, May 12, 2010 at 3:20 AM, Kevin Grittner  wrote:

> venu madhav  wrote:
>
> > When I try to get the last twenty records from the database, it
> > takes around 10-15  mins to complete the operation.
>
> Making this a little easier to read (for me, at least) I get this:
>
> select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
>e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
>e.wifi_addr_2, e.view_status, bssid
>  FROM event e,
>   signature s
>  WHERE s.sig_id = e.signature
>AND e.timestamp >= '1270449180'
>AND e.timestamp <  '1273473180'
>  ORDER BY
>e.cid DESC,
>e.cid DESC
>  limit 21
>  offset 10539780
> ;
>
> Why the timestamp range, the order by, the limit, *and* the offset?
> On the face of it, that seems a bit confused.  Not to mention that
> your ORDER BY has the same column twice.
>
[Venu] The second column acts as a secondary key for sorting if the primary
sorting key is a different column. For this query both of them are same.
This query is part of an application which allows user to select time ranges
and retrieve the data in that interval. Hence the time stamp. To have it in
some particular order we're doing order by. If the records are more in the
interval, we display in sets of 20/30 etc. The user also has  the option to
browse through any of those records hence the limit and offset.

>
> Perhaps that OFFSET is not needed?  It is telling PostgreSQL that
> whatever results are generated based on the rest of the query, read
> through and ignore the first ten and a half million.  Since you said
> you had about ten million rows, you wanted the last 20, and the
> ORDER by is DESCending, you're probably not going to get what you
> want.
>
> What, exactly, *is* it you want again?
>
> [Venu] As explain above this query is part of the application where user
wishes to see the records from the database between any start and end times.
They get rendered as a HTML page with pagination links to traverse through
the data. The user has option to go to any set of records. When the user
asks for the last set of 20 records, this query gets executed.
Hope it is clear now. Please let me know if you need any further info.

Thank you,
Venu

> -Kevin
>


Re: [PERFORM] Performance issues when the number of records are around 10 Million

2010-05-14 Thread venu madhav
On Wed, May 12, 2010 at 3:22 AM, Shrirang Chitnis <
shrirang.chit...@hovservices.com> wrote:

> Venu,
>
> For starters,
>
> 1) You have used the e.cid twice in ORDER BY clause.
>
[Venu] Actually the second cid acts as a secondary sort order if any other
column in the table is used for sorting. In the query since the primary
sorting key was also  cid, we are seeing it twice. I can remove it.

> 2) If you want last twenty records in the table matching the criteria of
> timestamp, why do you need the offset?
>
[Venu] It is part of an UI  application where a user can ask for date
between any dates. It has the options to browse through the data retrieved
between those intervals.

> 3) Do you have indexes on sig_id, signature and timestamp fields?
>
[Venu] Yes, I do have indexes on those three.


> If you do not get a good response after that, please post the EXPLAIN
> ANALYZE for the query.
>
snort=# EXPLAIN ANALYZE select e.cid, timestamp, s.sig_class,
s.sig_priority, s.sig_name, e.sniff_ip, e.sniff_channel, s.sig_config,
e.wifi_addr_1,  e.wifi_addr_2, e.view_status, bssid  FROM event e, signature
s WHERE s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND
e.timestamp < '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21 offset
10539780;
 QUERY
PLAN
-
 Limit  (cost=7885743.98..7885743.98 rows=1 width=287) (actual
time=1462193.060..1462193.083 rows=14 loops=1)
   ->  Sort  (cost=7859399.66..7885743.98 rows=10537727 width=287) (actual
time=1349648.207..1456496.334 rows=10539794 loops=1)
 Sort Key: e.cid
 ->  Hash Join  (cost=2.44..645448.31 rows=10537727 width=287)
(actual time=0.182..139745.001 rows=10539794 loops=1)
   Hash Cond: ("outer".signature = "inner".sig_id)
   ->  Seq Scan on event e  (cost=0.00..487379.97 rows=10537727
width=104) (actual time=0.012..121595.257 rows=10539794 loops=1)
 Filter: (("timestamp" >= 1270449180::bigint) AND
("timestamp" < 1273473180::bigint))
   ->  Hash  (cost=2.35..2.35 rows=35 width=191) (actual
time=0.097..0.097 rows=36 loops=1)
 ->  Seq Scan on signature s  (cost=0.00..2.35 rows=35
width=191) (actual time=0.005..0.045 rows=36 loops=1)
 *Total runtime: 1463829.145 ms*
(10 rows)
Thank you,
Venu Madhav.

>
> Thanks,
>
> Shrirang Chitnis
> Sr. Manager, Applications Development
> HOV Services
> Office: (866) 808-0935 Ext: 39210
> shrirang.chit...@hovservices.com
> www.hovservices.com
>
>
> The information contained in this message, including any attachments, is
> attorney privileged and/or confidential information intended only for the
> use of the individual or entity named as addressee.  The review,
> dissemination, distribution or copying of this communication by or to anyone
> other than the intended addressee is strictly prohibited.  If you have
> received this communication in error, please immediately notify the sender
> by replying to the message and destroy all copies of the original message.
>
> From: pgsql-performance-ow...@postgresql.org [mailto:
> pgsql-performance-ow...@postgresql.org] On Behalf Of venu madhav
> Sent: Tuesday, May 11, 2010 2:18 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Performance issues when the number of records are around
> 10 Million
>
> Hi all,
>  In my database application, I've a table whose records can reach 10M
> and insertions can happen at a faster rate like 100 insertions per second in
> the peak times. I configured postgres to do auto vacuum on hourly basis. I
> have frontend GUI application in CGI which displays the data from the
> database. When I try to get the last twenty records from the database, it
> takes around 10-15  mins to complete the operation.This is the query which
> is used:
>
> select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
> e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
> e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE
> s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND e.timestamp <
> '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21 offset 10539780;
>
> Can any one suggest me a better solution to improve the performance.
>
> Please let me know if you've any further queries.
>
>
> Thank you,
> Venu
>


Re: [PERFORM] Performance issues when the number of records are around 10 Million

2010-05-14 Thread venu madhav
On Wed, May 12, 2010 at 3:17 AM, Jorge Montero <
jorge_mont...@homedecorators.com> wrote:

>  First, are you sure you are getting autovacuum to run hourly? Autovacuum
> will only vacuum when certain configuration thresholds are reached. You can
> set it to only check for those thresholds every so often, but no vacuuming
> or analyzing will be done unless they are hit, regardless of how often
> autovacuum checks the tables. Whenever you are dealing with time series, the
> default thresholds are often insufficient, especially when you are
> especially interested in the last few records on a large table.
>
>
[Venu] Yes, autovacuum is running every hour. I could see in the log
messages. All the configurations for autovacuum are disabled except that it
should run for every hour. This application runs on an embedded box, so
can't change the parameters as they effect the other applications running on
it. Can you please explain what do you mean by default parameters.


> What are your autovacuum configuration parameters?
>
[Venu] Except these all others are disabled.
 #---

# AUTOVACUUM
PARAMETERS
#---



autovacuum = on # enable autovacuum
subprocess?
autovacuum_naptime = 3600   # time between autovacuum runs, in
secs

When were the two tables last autovacuum and analyzed, according to
> pg_stat_user_tables?
>
[Venu] This is the content of pg_stat_user_tables for the two tables I am
using in that query.
* relid | schemaname | relname  | seq_scan | seq_tup_read | idx_scan
| idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
---++--+--+--+--+---+---+---+---
 41188 | public | event|  117 |   1201705723 |  998
|  2824 |28 | 0 | 0
 41209 | public | signature|  153 | 5365 |2
|72 | 1 | 0 | 0
*

> Could you post the output of explain analyze of your query?
>
 snort=# *EXPLAIN ANALYZE select e.cid, timestamp, s.sig_class,
s.sig_priority, s.sig_name, e.sniff_ip, e.sniff_channel, s.sig_config,
e.wifi_addr_1,  e.wifi_addr_2, e.view_status, bssid  FROM event e, signature
s WHERE s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND
e.timestamp < '1273473180'  ORDER BY e.cid DESC,
e.cid DESC limit 21 offset 10539780; *
 QUERY
PLAN
---

--
 Limit  (cost=7885743.98..7885743.98 rows=1 width=287) (actual
time=1462193.060..1462193.083 rows=14 loops=1)
   ->  Sort  (cost=7859399.66..7885743.98 rows=10537727 width=287)
(actual time=1349648.207..1456496.334 rows=10539794 loops=1)
 Sort Key: e.cid
 ->  Hash Join  (cost=2.44..645448.31 rows=10537727 width=287)
(actual time=0.182..139745.001 rows=10539794 loops=1)
   Hash Cond: ("outer".signature = "inner".sig_id)
   ->  Seq Scan on event e  (cost=0.00..487379.97
rows=10537727 width=104) (actual time=0.012..121595.257 rows=10539794
loops=1)
 Filter: (("timestamp" >= 1270449180::bigint) AND
("timestamp" < 1273473180::bigint))
   ->  Hash  (cost=2.35..2.35 rows=35 width=191) (actual
time=0.097..0.097 rows=36 loops=1)
 ->  Seq Scan on signature s  (cost=0.00..2.35
rows=35 width=191) (actual time=0.005..0.045 rows=36 loops=1)
 Total runtime: 1463829.145 ms
(10 rows)

> Which default statistic collection parameters do you use? Have you changed
> them specifically for the tables you are using?
>
[Venu] These are the statistic collection parameters:
* # - Query/Index Statistics Collector -

stats_start_collector = on
stats_command_string = on
#stats_block_level = off
stats_row_level = on
#stats_reset_on_server_start = off*
Please let me know if you are referring to something else.

> Which version of Postgres are you running? Which OS?
>
[Venu] Postgres Version 8.1 and Cent OS 5.1 is the Operating System.

Thank you,
Venu

>
>
>
> >>> venu madhav  05/11/10 3:47 AM >>>
>
> Hi all,
> In my database application, I've a table whose records can reach 10M and
> insertions can happen at a faster rate like 100 insertions per second in the
> peak times. I configured postgres to do auto vacuum on hourly basis. I have
> frontend GUI application in CGI which displays the data from the database.
> When I try to get the last twenty records from the database, it takes around
> 10-15 mins to complete the

[PERFORM] Performance issues when the number of records are around 10 Million

2010-05-11 Thread venu madhav
Hi all,
  In my database application, I've a table whose records can reach 10M
and insertions can happen at a faster rate like 100 insertions per second in
the peak times. I configured postgres to do auto vacuum on hourly basis. I
have frontend GUI application in CGI which displays the data from the
database. When I try to get the last twenty records from the database, it
takes around 10-15  mins to complete the operation.This is the query which
is used:

* select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE
s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND e.timestamp <
'1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21 offset 10539780;
*
Can any one suggest me a better solution to improve the performance.

Please let me know if you've any further queries.


Thank you,
Venu