date:20100722


Hi.
I'm trying to index the contents of word-documents, extracted text, 
which leads to quite large documents sometimes. This resutls in the 
following Exception:
Caused by: org.postgresql.util.PSQLException: ERROR: index row requires 
10376 bytes, maximum size is 8191


I have the following schema:
andreak=# \d origo_search_index
   Table "public.origo_search_index"
  Column  |   Type
|Modifiers

--+---+-
 id   | integer   | not null default 
nextval('origo_search_index_id_seq'::regclass)

 entity_id| integer   | not null
 entity_type  | character varying | not null
 field| character varying | not null
 search_value | character varying | not null
 textsearchable_index_col | tsvector  |

"origo_search_index_fts_idx" gin (textsearchable_index_col)

Triggers:
update_search_index_tsvector_t BEFORE INSERT OR UPDATE ON 
origo_search_index FOR EACH ROW EXECUTE PROCEDURE 
tsvector_update_trigger('textsearchable_index_col', 
'pg_catalog.english', 'search_value')


I store all the text extracted from the documents in "search_value" and 
have the built-in trigger tsvector_update_trigger update the 
tsvector-column.


Any hints on how to get around this issue to allow indexing large 
documents? I don't see how "only index the first N bytes of the 
document" would be of interest to anyone...


BTW: I'm using PG-9.0beta3

--
Andreas Joseph Krogh
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Finding last checkpoint time

2010-07-22 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


> Or you can use pg_controldata /path/to/pgdata and look 
> at "Time of latest checkpoint".

Assuming your system is using English. Otherwise, you'll 
have to build a collection of .po strings as we did for 
check_postgres.pl. Needless to say, I'd greatly prefer 
some other way to grab the information!

- -- 
Greg Sabino Mullane g...@turnstep.com
PGP Key: 0x14964AC8 201007220933
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkxISMwACgkQvJuQZxSWSsirbACfa3ujzyTLyzlPbG0QrDUC/0AB
BCYAnRfP0E2CJQM+V0qNzgdsi47OjWKB
=+XW4
-END PGP SIGNATURE-



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another


Oliver Kohll - Mailing Lists wrote:


On 21 Jul 2010, at 23:14, Joe Conway > wrote:



If you want something simple, and not requiring PostGIS, but plpgsql
instead, see:

http://archives.postgresql.org/pgsql-sql/2003-12/msg00193.php


For completeness, the earthdistance module also provides the distance 
between two lat/longs, the point<@>point syntax is simple to use:

http://www.postgresql.org/docs/8.3/static/earthdistance.html


I did look at earthdistance before.  Revisiting it now, thanks.

So, I'm trying to figure out this syntax.  The docs say:

point <@> point - float8 - gives the distance in statue miles between 
two points on the Earth's surface.


How does longitude and latitude fit into this picture?  I can't find any 
other documentation or examples?


I've got the contrib mods installed as 'select earth()' works fine.



Regards
Oliver Kohll

oli...@agilebase.co.uk  / +44(0)7814 
828608 / skype:okohll

www.agilebase.co.uk  - software




--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another


Oliver Kohll - Mailing Lists wrote:


On 21 Jul 2010, at 23:14, Joe Conway > wrote:



If you want something simple, and not requiring PostGIS, but plpgsql
instead, see:

http://archives.postgresql.org/pgsql-sql/2003-12/msg00193.php


For completeness, the earthdistance module also provides the distance 
between two lat/longs, the point<@>point syntax is simple to use:

http://www.postgresql.org/docs/8.3/static/earthdistance.html


Trying to figure out the proper usage.  My assumptions:

use ll_to_earth() to get point values to pass to 'point <@> point'

First issue, ll_to_earth() returns three values, not one.

Second issue, I tried something like:

select (ll_to_earth(46,67)<@>ll_to_earth(57,87));

I get:

ERROR:  operator does not exist: earth <@> earth
LINE 1: select (ll_to_earth(46,67)<@>ll_to_earth(57,87));

So I tried:

select (4618419.15006707<@>4394453.66154081);

And I get:

ERROR:  operator does not exist: numeric <@> numeric
LINE 1: select (4618419.15006707<@>4394453.66154081);
^
HINT:  No operator matches the given name and argument type(s). You 
might need to add explicit type casts.


What am I missing???



--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another


Oliver Kohll - Mailing Lists wrote:


On 21 Jul 2010, at 23:14, Joe Conway > wrote:



If you want something simple, and not requiring PostGIS, but plpgsql
instead, see:

http://archives.postgresql.org/pgsql-sql/2003-12/msg00193.php


For completeness, the earthdistance module also provides the distance 
between two lat/longs, the point<@>point syntax is simple to use:

http://www.postgresql.org/docs/8.3/static/earthdistance.html


Disgregard my last post, Surely as soon as I hit send, the light went 
on...  I'm looking at deriving my points for point <@> point from 
ll_to_earth().




Regards
Oliver Kohll

oli...@agilebase.co.uk  / +44(0)7814 
828608 / skype:okohll

www.agilebase.co.uk  - software




--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Getting statistics for each sql statement?

2010-07-22 Thread Stefan-Michael Guenther


Hello,

is it possible to get statistics on the usage of different sql 
statements, e.g. how many INSERT or UPDATE statements per day?


log_statement_stats doesn't seem to be the right parameter or I haven't 
found the output statistic for this command.


Any ideas or suggestions?

Thanks,

Stefan


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

RESOLVED: Re: [GENERAL] Maximum document-size of text-search?


On 07/22/2010 03:31 PM, Andreas Joseph Krogh wrote:

Hi.
I'm trying to index the contents of word-documents, extracted text, 
which leads to quite large documents sometimes. This resutls in the 
following Exception:
Caused by: org.postgresql.util.PSQLException: ERROR: index row 
requires 10376 bytes, maximum size is 8191


I have the following schema:
andreak=# \d origo_search_index
   Table "public.origo_search_index"
  Column  |   Type
|Modifiers
--+---+- 

 id   | integer   | not null default 
nextval('origo_search_index_id_seq'::regclass)

 entity_id| integer   | not null
 entity_type  | character varying | not null
 field| character varying | not null
 search_value | character varying | not null
 textsearchable_index_col | tsvector  |

"origo_search_index_fts_idx" gin (textsearchable_index_col)

Triggers:
update_search_index_tsvector_t BEFORE INSERT OR UPDATE ON 
origo_search_index FOR EACH ROW EXECUTE PROCEDURE 
tsvector_update_trigger('textsearchable_index_col', 
'pg_catalog.english', 'search_value')


I store all the text extracted from the documents in "search_value" 
and have the built-in trigger tsvector_update_trigger update the 
tsvector-column.


Any hints on how to get around this issue to allow indexing large 
documents? I don't see how "only index the first N bytes of the 
document" would be of interest to anyone...


BTW: I'm using PG-9.0beta3


Never mind... I was having a btree index on search_value too, which of 
course caused the problem.


--
Andreas Joseph Krogh
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Clarification of the "simple" dictionary

Hi. It's not clear to me if the "simple" dictionary uses stopwords or 
not, does it?
Can someone please post a complete description of what the "simple" 
dict. does?


--
Andreas Joseph Krogh
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

2010-07-22 Thread Peter Hunsberger

On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers  wrote:
>
> On Thu, Jul 22, 2010 at 1:02 PM, Scott Marlowe  
> wrote:
>
>
> >>
> >> Why on Earth would I want to store this sort of stuff in a bit string?!
> >
> > Because you are manipulating bits and not integers?  I guess there are
> > 10 kinds of people, those who like think in binary and those who
> > don't.
> >
> >> I don't know about you, but I find looking at 21205 a darn'd site
> >> easier than staring blankly at 101001011010101!!

 lots of stuff

> >
> > Note you can cast integer to bitstring, but there may be some odd
> > behaviour for sign bits and such.  Which is again why I'd use the
> > right type for the job, bit string.  But it's your project.
> >
>
> Quoting...
>
> > Because you are manipulating bits and not integers?  I guess there are
> > 10 kinds of people, those who like think in binary and those who
> > don't.
>
> Er, no. 21205 is not an integer. It's an encoded bit of magic.
>

In that case your database design is fundamentally broken.  A database
should have content fields that map to the needs of the application.
As you describe your application requirements, that is a bit string
and not an integer.  Use bit strings and your application logic is
transparent, obvious and easy to maintain.  Use integers and you have
to resort to "magic".  As you say, it's your choice, but you came here
looking for advice and the advice you were given is very good

--
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] optimizing daily data storage in Pg

2010-07-22 Thread P Kishor

I have been struggling with this for a while now, have even gone down
a few paths but struck out, so I turn now to the community for ideas.
First, the problem: Store six daily variables for ~ 25 years for cells
in a grid.

 * Number of vars = 6
 * Number of cells ~ 13 million
 * Number of days ~ 9125 (25 * 365)

Optimize the store for two different kinds of queries:

Query one: Retrieve the value of a single var for all or a portion of
the cells for a single day. This is analogous to an image where every
pixel is the value of a single var.

Query two: Retrieve values for all the days or a duration of days for
a single var for a single cell. This is like grabbing a column out of
a table in which each row holds all the vars for a single day.

So, I set about designing the db. The "grid" is in a table with 13 million rows

CREATE TABLE cells (
cell_id  INTEGER,
other_data ..
)
WITH (
OIDS=FALSE
)


A single table *where every row is one day's values for one cell* looks like so

CREATE TABLE d (
yr  SMALLINT,
ydaySMALLINT,
a   SMALLINT,
b   SMALLINT,
d   SMALLINT,
e   SMALLINT,
f   SMALLINT,
g   SMALLINT,
cell_id INTEGER
)
WITH (
OIDS=FALSE
)

The data would look like so

yr  ydaya   b   c   d   e   f   g   cell_id

19801   x   x   x   x   x   x   x   1
..
1980365 x   x   x   x   x   x   x   1
...
19811   x   x   x   x   x   x   x   1
..
1981365 x   x   x   x   x   x   x   1
  ...
  ...
20051   x   x   x   x   x   x   x   1
..
2005365 x   x   x   x   x   x   x   1
..
19801   x   x   x   x   x   x   x   2
..
1980365 x   x   x   x   x   x   x   2
  ...

I could now (theoretically) conduct my queries like so:

Query 1a: Retrieve the value of a single var for all the cells for a
single day. This is analogous to an image where every pixel is the
value of a single var.

SELECT  FROM d WHERE yr = ? AND yday = ?;

I assuming I would need an index on yr and yday, or perhaps even a
compound index on (yr, yday).

Query 1b: Retrieve the value of a single var for a portion of the
cells for a single day. This is analogous to an image where every
pixel is the value of a single var.

SELECT  FROM d WHERE yr = ? AND yday = ? AND cell_id IN (?,?,?...);

I assuming I would need an index on yr and yday, or perhaps even a
compound index on (yr, yday) AND an index on cell_id.

Query 2: Retrieve values for all the days or a duration of days for a
single var for a single cell. This is like grabbing a column out of a
table in which each row holds all the vars for a single day.

SELECT  FROM d WHERE cell_id = ?;
SELECT  FROM d WHERE cell_id IN (?,?,?...);

Once again, an index on cell_id would assist in the above.

The problem: The above table would have 13 M * 9125 rows ~ 118 billion
rows. Huge indexes, slow queries, etc. In fact, major issues loading
the data in the first place. Since I am loading data in batches, I
drop the indexes (takes time), COPY data into the table (takes time),
build the indexes (takes time), experiment with improving the
performance (takes time), fail, rinse, lather, repeat. I actually
tried the above with a subset of data (around 100 M rows) and
experienced all of the above. I don't remember the query times, but
they were awful.

So, I partitioned the table into years like so

CREATE TABLE d_ (
CHECK ( yr =  )
) INHERITS (d)

Hmmm... still no satisfaction. I ended up with 1 master table + 25
inherited tables. Each of the year tables now had ~ 4.75 billion rows
(13 M * 365), and the queries were still very slow.

So, I partitioned it all by years and days like so

CREATE TABLE d__ (
CHECK ( yr =  AND yday =  )
) INHERITS (d)

Each table now has 13 million rows, and is reasonably fast (although
still not satisfactorily fast), but now I have 9K tables. That has its
own problems. I can't query the master table anymore as Pg tries to
lock all the tables and runs out of memory. Additionally, I can't
anymore conduct query two above. I could do something like

SELECT a FROM d_1980_1 WHERE cell_id = 1
UNION
SELECT a FROM d_1980_2 WHERE cell_id = 1
UNION
SELECT a FROM d_1980_3 WHERE cell_id = 1
UNION
SELECT a FROM d_1980_4 WHERE cell_id = 1
UNION
...

But the above is hardly optimal.

Any suggestions, ideas, brainstorms would be appreciated. Perhaps Pg,
or even a RDBMS, is not the right tool for this problem, in which
case, suggestion for alternatives would be welcome as well.

Right now I am testing this on a dual Xeon dual core 3 GHz Xserve with
12 GB RAM. The PGDATA directory is located on an attached RAID that is
configured as RAID5. Reasonable time for a query w

[GENERAL] How to improve performance in reporting database?

2010-07-22 Thread Matthew Wilson

I have a daily job that pushes data from the production database into
the reporting database, which right now, is an exact copy.

I have a webapp that builds lots of reports for users.  Most of these
reports involve elaborate joins of lookup tables and lots of summations,
and they take too long to run, even after using everything I know to
tune the queries.

Since I know this is a read-only data, it seems like I should be able to
speed everything up dramatically if I run the queries offline and then
save the results into new tables.  Then the web app could just grab the
cached results out of these new tables and then spit them out quickly.

I've heard people talking about using "materialized views" for this, but
that was with Oracle.

What's the postgresql way here?

More generally, any advice on running reporting databases well is
welcome.


Matt


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Getting statistics for each sql statement?

2010-07-22 Thread Ben Chobot

On Jul 22, 2010, at 4:50 AM, Stefan-Michael Guenther wrote:

> Hello,
> 
> is it possible to get statistics on the usage of different sql statements, 
> e.g. how many INSERT or UPDATE statements per day?
> 
> log_statement_stats doesn't seem to be the right parameter or I haven't found 
> the output statistic for this command.
> 
> Any ideas or suggestions?

The usual approach is to make a log file that holds all your queries and then 
let something like pgFouine grovel through it.
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to distribute quantity if same product is in multiple rows

2010-07-22 Thread Andrus


Tim,

Thank you.


It can be done in SQL: "SUM(kogus) OVER (PARTITION BY toode
ORDER BY ID) - kogus" (*1) will give you the running sum of
the product up to that row. You can then subtract that value
from the delivered quantity to calculate the delivered quan-
tity for the current row.



 But doing so automatically is probably bad. For example,
if a user has a purchase order with one position of two
pieces and one position of four, it is very likely that when
a shipment of four pieces arrives, the latter position shall
be marked as delivered. So I would leave the decision to the
user.


If four pieces arrived, first position of 2 pieces should marked as 
delivered.
Second position of 4 pieces shoudl be marked as partialli delivered by 
setting undelivered quantity

of this row to 2

How to use your suggestion for this ?
How to implement this is PostgreSql 8.1,8.2, 8.3 ?

Andrus.


(*1)   In PostgreSQL 9.0, you might be able to use "ROWS

  BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING" instead
  (untested).

PS. If possible please use cc: with my email address in reply.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clarification of the "simple" dictionary

2010-07-22 Thread John Gage

The easiest way to look at this is to give the simple dictionary a  
document with to_tsvector() and see if stopwords pop out.


In my experience they do.  In my experience, the simple dictionary  
just breaks the document down into the space etc. separated words in  
the document.  It doesn't analyze further.


John


On Jul 22, 2010, at 4:15 PM, Andreas Joseph Krogh wrote:

Hi. It's not clear to me if the "simple" dictionary uses stopwords  
or not, does it?
Can someone please post a complete description of what the "simple"  
dict. does?


--
Andreas Joseph Krogh
Senior Software Developer / CTO
 
+-+
OfficeNet AS| The most difficult thing in the world is  
to |
Rosenholmveien 25   | know how to do a thing and to  
watch |
1414 Trollåsen  | somebody else doing it wrong,  
without   |
NORWAY  |  
comment.|

   | |
Tlf:+47 24 15 38 90  
| |
Fax:+47 24 15 38 91  
| |
Mobile: +47 909  56 963  
| |
 
+-+



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Are identical subqueries in unioned statements nonrepeatable?

2010-07-22 Thread Derrick Rice

Hi all.  I had no luck finding a previous message or documentation related
to the effective transaction isolation of subqueries, specifically identical
subqueries in union statements.  Consider the following statement executed
without a transaction.

select true as from_one, table_one.*
from table_one
where table_one.id not in (select id from reference_table)
   union all
select false as from_one, table_two.*
from table_two
where table_two.id not in (select id from reference_table)

Is it possible for the contents of reference_table to differ from the first
select to the select on the right hand side of the union?  (e.g. because
some other transaction committed additional rows).  Or even from row
comparison to row comparison in the same select (I highly doubt that).

If it is not possible, why?  Is it because a single query always executes
with serializable (effective) isolation?  Is it because postgresql
recognizes that the query is repeated and uses a single result set in both
sides of the union?

Is this behavior that is part of postgresql intentionally, or a side effect
that I should not rely on?

Assumption:  I'm assuming that it's faster to union all after filtering by
the where clause than to union all then filter by a single where clause.
The subquery for exclude_ids is very fast and the results of each of the
selects is a small fraction of the entire tables.  Doing a union first would
be expensive in comparison to doing a union of the

Thanks,

Derrick

Re: [GENERAL] Clarification of the "simple" dictionary


On 07/22/2010 06:27 PM, John Gage wrote:
The easiest way to look at this is to give the simple dictionary a 
document with to_tsvector() and see if stopwords pop out.


In my experience they do.  In my experience, the simple dictionary 
just breaks the document down into the space etc. separated words in 
the document.  It doesn't analyze further.


That's my experience too, I just want to make sure it doesn't actually 
have any stopwords which I've missed. Trying many phrases and checking 
for stopwords isn't really proving anything.


Can anybody confirm the "simple" dict. only lowercases the words and 
"uniques" them?


--
Andreas Joseph Krogh
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clarification of the "simple" dictionary

2010-07-22 Thread Oleg Bartunov

Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY

12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token to lower
case and checking it against a file of stop words. If it is found in the file
then an empty array is returned, causing the token to be discarded. If not, the
lower-cased form of the word is returned as the normalized lexeme.
Alternatively, the dictionary can be configured to report non-stop-words as
unrecognized, allowing them to be passed on to the next dictionary in the list.

d=# \dFd+ simple
List of text search dictionaries
Schema | Name | Template | Init options |Description
++---+--+---

pg_catalog | simple | pg_catalog.simple | | simple dictionary:
just lower case and check for stopword

By default it has no Init options, so it doesn't check for stopwords.

On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:

On 07/22/2010 06:27 PM, John Gage wrote:
The easiest way to look at this is to give the simple dictionary a document
with to_tsvector() and see if stopwords pop out.

In my experience they do. In my experience, the simple dictionary just
breaks the document down into the space etc. separated words in the
document. It doesn't analyze further.

That's my experience too, I just want to make sure it doesn't actually have
any stopwords which I've missed. Trying many phrases and checking for
stopwords isn't really proving anything.

Can anybody confirm the "simple" dict. only lowercases the words and
"uniques" them?

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clarification of the "simple" dictionary


On 07/22/2010 07:44 PM, Oleg Bartunov wrote:

Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY 



12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token 
to lower case and checking it against a file of stop words. If it is 
found in the file then an empty array is returned, causing the token 
to be discarded. If not, the lower-cased form of the word is returned 
as the normalized lexeme. Alternatively, the dictionary can be 
configured to report non-stop-words as unrecognized, allowing them to 
be passed on to the next dictionary in the list.


d=# \dFd+ simple
  List of text search 
dictionaries
   Schema   |  Name  | Template  | Init options 
|Description 
++---+--+--- 

 pg_catalog | simple | pg_catalog.simple |  | simple 
dictionary: just lower case and check for stopword


By default it has no Init options, so it doesn't check for stopwords.


Guess what - I *have* read the docs which sais "...and checking it 
against a file of stop words". What was unclear to me was whether or not 
it was configured with a stopwords-file or not as default, which is not 
the case I understand from your reply. Very good, fits my needs like a 
glove:-) It might be worth considering updating the docs to make this 
clearer?


So - can we rely on "simple" to remain this way forever (no Init 
options) or is it better to make a copy of it with the same properties 
as today?


It seems "simple" + the unaccent dict. available in 9.0 saves my day, 
thanks Mr. Bartunov.


--
Andreas Joseph Krogh
Senior Software Developer / CTO
+-+
OfficeNet AS| The most difficult thing in the world is to |
Rosenholmveien 25   | know how to do a thing and to watch |
1414 Trollåsen  | somebody else doing it wrong, without   |
NORWAY  | comment.|
| |
Tlf:+47 24 15 38 90 | |
Fax:+47 24 15 38 91 | |
Mobile: +47 909  56 963 | |
+-+


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] varchar[] or text[]

2010-07-22 Thread Armand Turpel


 Hi,
I know this issue was controversed discussed. Some one see no really 
benefits of using varchar against text var. But i'm asking me what if i 
use it as array values. Is there any difference between varchar[126], 
varchar[1] and text[] else than the number of chars i can store in? 
Performance, memory usage, ?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] varchar[] or text[]

2010-07-22 Thread Peter C. Lai

This was discussed yesterday and previously. Please read the archives.

There is no positive performance reason to use varchar instead of text.

On 2010-07-22 05:38:14PM +0200, Armand Turpel wrote:
>   Hi,
> I know this issue was controversed discussed. Some one see no really 
> benefits of using varchar against text var. But i'm asking me what if i 
> use it as array values. Is there any difference between varchar[126], 
> varchar[1] and text[] else than the number of chars i can store in? 
> Performance, memory usage, ?
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

-- 
===
Peter C. Lai | Bard College at Simon's Rock
Systems Administrator| 84 Alford Rd.
Information Technology Svcs. | Gt. Barrington, MA 01230 USA
peter AT simons-rock.edu | (413) 528-7428
===


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another

2010-07-22 Thread Oliver Kohll - Mailing Lists

On 22 Jul 2010, at 12:57, Geoffrey wrote:

>> For completeness, the earthdistance module also provides the distance 
>> between two lat/longs, the point<@>point syntax is simple to use:
>> http://www.postgresql.org/docs/8.3/static/earthdistance.html
> 
> Disgregard my last post, Surely as soon as I hit send, the light went on...  
> I'm looking at deriving my points for point <@> point from ll_to_earth().

I constructed mine using point(longitude, latitude), where long and lat are 
double precision, which returns a datatype of type point. ll_to_earth() looks 
like it returns a datatype of type earth, so not sure if it will work. Maybe 
things have changed in a recent release, please let me know if so.

So an example would be
select point(-2.2171,56.8952)<@>point(-1.2833,51.6667) as miles; 
  miles   
--
 363.202864676916
(1 row)

Regards
Oliver Kohll

oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software

Re: [GENERAL] How to improve performance in reporting database?

2010-07-22 Thread Greg Smith


Matthew Wilson wrote:

I've heard people talking about using "materialized views" for this, but
that was with Oracle.
  


You can build those manually with PostgreSQL if you really want them:  
http://wiki.postgresql.org/wiki/Materialized_Views


The fundamental architecture is sound for a lot of problems in this 
area, you just have to figure out how to build them efficiently.  In 
your case, you might just consider if there's a way way to update the MV 
in batches, rather than rely on triggers to keep the data up to date, 
after each data import.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to improve performance in reporting database?

On Thu, Jul 22, 2010 at 8:45 AM, Matthew Wilson  wrote:
> I have a daily job that pushes data from the production database into
> the reporting database, which right now, is an exact copy.
>
> I have a webapp that builds lots of reports for users.  Most of these
> reports involve elaborate joins of lookup tables and lots of summations,
> and they take too long to run, even after using everything I know to
> tune the queries.
>
> Since I know this is a read-only data, it seems like I should be able to
> speed everything up dramatically if I run the queries offline and then
> save the results into new tables.  Then the web app could just grab the
> cached results out of these new tables and then spit them out quickly.
>
> I've heard people talking about using "materialized views" for this, but
> that was with Oracle.
>
> What's the postgresql way here?

http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

They're kinda roll your own, but they're not that hard to do.

> More generally, any advice on running reporting databases well is
> welcome.

Throw more drives and RAM at the problem, and use materialized views.
Also you're often better off with fewer faster cpus than more slower
ones for reporting servers (the opposite of OLTP where number of cores
is far more important.)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are identical subqueries in unioned statements nonrepeatable?

2010-07-22 Thread Alvaro Herrera

Excerpts from Derrick Rice's message of jue jul 22 12:27:31 -0400 2010:

> Is it possible for the contents of reference_table to differ from the first
> select to the select on the right hand side of the union?  (e.g. because
> some other transaction committed additional rows).

No.

> If it is not possible, why?  Is it because a single query always executes
> with serializable (effective) isolation?

Yes.  (Actually: it's because a query is always executed with a single
snapshot).

> Is it because postgresql
> recognizes that the query is repeated and uses a single result set in both
> sides of the union?

No.

> Is this behavior that is part of postgresql intentionally, or a side effect
> that I should not rely on?

It is intentional and will not be changed.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Insert and Retrieve unsigned char sequences using C

2010-07-22 Thread Merlin Moncure

2010/7/22 Vinícius Soares :
> Hey,
>
> thanks for your response.
> I did it:
>
>     S8 sql[1500] = "insert into t values ( E'";
>         U8 *msg;
>     msg = PQescapeByteaConn(conn, pending_cmd->cmd.value,
> sizeof(msg_cmd_t), &to_length);
>     for (i=0; i < sizeof(msg_cmd_t); i++){
>     S8 str[20] = "";
>     sprintf(str, "%c", *(msg+i) );
>     strcat(sql, str);
>     }
>         strcat(sql, "' );");
>         PQexec(conn, sql);
>
> But it is very strange because sometimes it works but others times it does
> not work.
> is it right?

That code doesn't look right: you need to make sure your 'to' is big
enough: at has to be at least (2*N)+1 where N is the input size.  it
returns a size_t, not a char*, and you should be able to just sprintf
the 'to' into your query, not copy the chars in a loop.  see the
following fragment:

#define ARGSZ 64
char my_bytea[ARGSZ];
char escaped_bytea[(2*ARGSZ)+1];
int error;
size_t nbytes;

nbytes = PQescapeStringConn (conn, escaped_bytea, my_bytea,
sizeof(my_bytea), &error);

if(error != 0)
  // handle error

sprintf(querybuf, "insert into foo(bytea_col) values (E'%s')", escaped_bytea);

like I said earlier, this is just about the absolute worst way to
transfer a bytea to the server.  I had to look up the docs for
PQescapeStringConn -- I've never once used it my entire life (or it's
even more evil cousin, PQescapeString).

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to distribute quantity if same product is in multiple rows

2010-07-22 Thread Tim Landscheidt

(anonymous) wrote:

>>It can be done in SQL: "SUM(kogus) OVER (PARTITION BY toode
>>ORDER BY ID) - kogus" (*1) will give you the running sum of
>>the product up to that row. You can then subtract that value
>>from the delivered quantity to calculate the delivered quan-
>>tity for the current row.

>>  But doing so automatically is probably bad. For example,
>>if a user has a purchase order with one position of two
>>pieces and one position of four, it is very likely that when
>>a shipment of four pieces arrives, the latter position shall
>>be marked as delivered. So I would leave the decision to the
>>user.

> If four pieces arrived, first position of 2 pieces should
> marked as delivered.
> Second position of 4 pieces shoudl be marked as partialli
> delivered by setting undelivered quantity
> of this row to 2

> How to use your suggestion for this ?

Que? You take the query above, join it in the "UPDATE" and
set the delivered quantity to the minimum of the ordered
quantity and "taitmkogus - sumkogus".

> How to implement this is PostgreSql 8.1,8.2, 8.3 ?
> [...]

An example for calculating running sums without window
functions can be found at
http://archives.postgresql.org/pgsql-sql/2001-07/msg00152.php>.
I would rather use a PL/pgSQL function in this case, though.

Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] locating cities within a radius of another


Oliver Kohll - Mailing Lists wrote:


On 22 Jul 2010, at 12:57, Geoffrey wrote:

For completeness, the earthdistance module also provides the distance 
between two lat/longs, the point<@>point syntax is simple to use:

http://www.postgresql.org/docs/8.3/static/earthdistance.html


Disgregard my last post, Surely as soon as I hit send, the light went 
on...  I'm looking at deriving my points for point <@> point from 
ll_to_earth().


I constructed mine using point(longitude, latitude), where long and lat 
are double precision, which returns a datatype of type point. 
ll_to_earth() looks like it returns a datatype of type earth, so not 
sure if it will work. Maybe things have changed in a recent release, 
please let me know if so.


So an example would be
select point(-2.2171,56.8952)<@>point(-1.2833,51.6667) as miles; 
  miles   
--

 363.202864676916
(1 row)


Perfect, that appears to work for me as well, thanks.



Regards
Oliver Kohll

oli...@agilebase.co.uk  / +44(0)7814 
828608 / skype:okohll

www.agilebase.co.uk  - software






--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clarification of the "simple" dictionary

2010-07-22 Thread Oleg Bartunov

Andreas,

I'd create myself copy of dictionary to be independent on system changes.

Oleg
On Thu, 22 Jul 2010, Andreas Joseph Krogh wrote:

On 07/22/2010 07:44 PM, Oleg Bartunov wrote:

Don't guess, but read docs
http://www.postgresql.org/docs/8.4/interactive/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY

12.6.2. Simple Dictionary

The simple dictionary template operates by converting the input token to
lower case and checking it against a file of stop words. If it is found in
the file then an empty array is returned, causing the token to be
discarded. If not, the lower-cased form of the word is returned as the
normalized lexeme. Alternatively, the dictionary can be configured to
report non-stop-words as unrecognized, allowing them to be passed on to the
next dictionary in the list.

By default it has no Init options, so it doesn't check for stopwords.

Guess what - I *have* read the docs which sais "...and checking it against a
file of stop words". What was unclear to me was whether or not it was
configured with a stopwords-file or not as default, which is not the case I
understand from your reply. Very good, fits my needs like a glove:-) It might
be worth considering updating the docs to make this clearer?

So - can we rely on "simple" to remain this way forever (no Init options) or
is it better to make a copy of it with the same properties as today?

It seems "simple" + the unaccent dict. available in 9.0 saves my day, thanks
Mr. Bartunov.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread Piotr Gasidło

Hello,

I have strange problem.

I test/optimize my queries with EXPLAIN ANALYZE. I get for example:

Total runtime: 40.794 ms

But when I run query without EXPLAIN ANALYZE i get, for example:

Time: 539.252 ms

Query returns 33 rows. Why?

I do checks with psql connected using socket to postgresql server. No SSL.

Using PostgreSQL 8.4.4.

-- 
Piotr Gasidło

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread Szymon Guz

W dniu 22 lipca 2010 21:24 użytkownik Piotr Gasidło
napisał:

> Hello,
>
> I have strange problem.
>
> I test/optimize my queries with EXPLAIN ANALYZE. I get for example:
>
> Total runtime: 40.794 ms
>
> But when I run query without EXPLAIN ANALYZE i get, for example:
>
> Time: 539.252 ms
>
> Query returns 33 rows. Why?
>
> I do checks with psql connected using socket to postgresql server. No SSL.
>
> Using PostgreSQL 8.4.4.
>
>
Hi,
maybe the query waits on a lock or maybe the returned rows are very big.

regards
Szymon Guz

Re: [GENERAL] Clarification of the "simple" dictionary

2010-07-22 Thread John Gage




By default it has no Init options, so it doesn't check for stopwords.


In the first place, this functionality is a rip-snorting home run on  
Postgres.  I congratulate Oleg who I believe is one of the authors.


In the second, I too had not read (carefully) the documentation and am  
very happy to find that I can eliminate stop words with 'simple'.   
That will be a tremendous convenience going forward.


It turns out that using 'english' and getting stemmed lexemes is  
extremely convenient too, but this functionality in 'simple' is  
excellent.


Thanks,

John



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread Piotr Gasidło

W dniu 22 lipca 2010 21:34 użytkownik Szymon Guz  napisał:
> maybe the query waits on a lock or maybe the returned rows are very big.

So shouldn't EXPLAIN ANALYZE be also affected by waiting for lock?

The row has width = 313, so it's not big.

I've reduced it to witdh = 12 (only one column, int4 type), and get
EXPLAIN ANALYZE:

Total runtime: 14.788 ms

And only SELECT returns:

Time: 456,528 ms

Or maybe "Total runtime" it's not what I thought it is, and I should
look at psql \timing result, which form EXPLAIN ANALYZE is nearly the
same like for SELECT:

Time: 402,675 ms

-- 
Piotr Gasidło

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Need some help on PG database transfer

2010-07-22 Thread Steeles

Hi all,

I am doing some tests on the file level copying for one of database in
windows platform. Here is what I want to achieve.

copy files within PG tablespace folder where PG database resides to target
machine. I want to attache the copied data to target PG database.


What I have done is

1. Database: ABCD, OID: 12345, tablespace: e:\pg_data\, obviously,
e:\pg_data has a subfolder named 12345.
2. create database in target, named , and it will create its own OID,
for example, 16333, the folder 16333 will reside under BASE folder.
3. then I stop PG service, delete all files under ..\16333\*.*,
4. copy all files from e:\pg_data\12345\*.* to the folder in target server,
..\16333\.
5. start PG service in target machine. it looks like it pickup most of
tables that are from its source database, ABCD.

But, It is missing tables and functions, compared to sources.

Is it doable for replicating data like that?

PG tablespace, does it contain all the data/tables in its table space
folder?

Please comment.

Thanks.

Re: [GENERAL] Need some help on PG database transfer

On Thu, Jul 22, 2010 at 2:08 PM, Steeles  wrote:
> Hi all,
>
> I am doing some tests on the file level copying for one of database in
> windows platform. Here is what I want to achieve.
>
> copy files within PG tablespace folder where PG database resides to target
> machine. I want to attache the copied data to target PG database.

That won't work.  You have to pg_dump out the tables / tablespace and
then psql or pg_restore it to the other db.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

2010-07-22 Thread Peter Hunsberger

On Thu, Jul 22, 2010 at 4:09 PM, Howard Rogers  wrote:
>
> On Fri, Jul 23, 2010 at 12:35 AM, Peter Hunsberger
>  wrote:
> > On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers  wrote:
> >>
> there's a room-full of users who can look
> at code '4097' and know precisely what it means and would be mortified
> if I suddenly started displaying exactly the same meanings in what, to
> them, would look like utter gibberish.
>

In that case, you shouldn't be describing the column as "some encoded
bit of magic" here.  It clearly has some some semantic meaning which
gives you a reason to want to keep it that way.  Though why your users
are dealing with the raw values as stored in the database may be
another issue to deal with: Personally, I'd say store it in the way
that is easiest for your application logic to deal with, display it in
the form that is easiest for your users to deal with.  The are often
two completely different things...

--
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread Piotr Gasidło

W dniu 22 lipca 2010 21:46 użytkownik Piotr Gasidło
 napisał:
> W dniu 22 lipca 2010 21:34 użytkownik Szymon Guz  napisał:
> (...)

Something new. The query is run against table which has been
partitioned into a lot of small tables.
When I combine data and put it into one table - the execution of
EXPLAIN ANALYZE compares with real SELECT timeing.

On paritioned:

EXPLAIN ANALYZE SELECT ...
Total runtime: 14.790 ms
Time: 291,637 ms

On one table with data from all partitions and same indexes:

EXPLAIN ANALYZE SELECT ...

Total runtime: 16.418 ms
Time: 17,371 ms

Can someone give me clue why EXPLAIN ANALYZE don't work correctly with
partitions?

-- 
Piotr Gasidło

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Question about SCO openserver and postgres...

2010-07-22 Thread Edmundo Robles L.

Hi!
  I have a problem with the  max  postgres connections  on SCO 
Openserver 5.0.7, so ...my boss decided to buy  the SCO Openserver 6.0
but this   version comes in 2  editions:  Starter and Enterprise.

If SCO 5.0.7 only allows 95 ( -3  used by superuser)  connections to 
postgres...

Do you know  how many connections to postgres  can i have with 
OpenServer   in Starter Edition or Enterprise edition?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] psql problem

2010-07-22 Thread Gary Fu


Hi,

System information:
- psql 8.4.4 on a client with CentOS 5.5 (64 bits)
- postgres 8.4.4 on the server with CentOS 5.5 (64 bits)
- the client is connected with vpn

I have a script to create a table with some comments in front.  When I 
use the command 'psql -f script.sql' to load it, it hangs.  However, if 
I remove the comments, OR remove some of the columns from the table, it 
works okay.   It looks like to me, the psql will hang with large size of 
the script file.  I tried 'psql < script.sql' and 'cat script.sql | 
psql' with the same result.


However, I tried it on another client host (CentOS 5.5 32 bits), I don't 
see this problem.


Any idea and suggestion ?

Thanks,
Gary

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

2010-07-22 Thread Howard Rogers

On Fri, Jul 23, 2010 at 12:35 AM, Peter Hunsberger
 wrote:
> On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers  wrote:
>>
>> On Thu, Jul 22, 2010 at 1:02 PM, Scott Marlowe  
>> wrote:
>>
>>
>> >>
>> >> Why on Earth would I want to store this sort of stuff in a bit string?!
>> >
>> > Because you are manipulating bits and not integers?  I guess there are
>> > 10 kinds of people, those who like think in binary and those who
>> > don't.
>> >
>> >> I don't know about you, but I find looking at 21205 a darn'd site
>> >> easier than staring blankly at 101001011010101!!
>
>  lots of stuff
>
>> >
>> > Note you can cast integer to bitstring, but there may be some odd
>> > behaviour for sign bits and such.  Which is again why I'd use the
>> > right type for the job, bit string.  But it's your project.
>> >
>>
>> Quoting...
>>
>> > Because you are manipulating bits and not integers?  I guess there are
>> > 10 kinds of people, those who like think in binary and those who
>> > don't.
>>
>> Er, no. 21205 is not an integer. It's an encoded bit of magic.
>>
>
> In that case your database design is fundamentally broken.  A database
> should have content fields that map to the needs of the application.
> As you describe your application requirements, that is a bit string
> and not an integer.  Use bit strings and your application logic is
> transparent, obvious and easy to maintain.  Use integers and you have
> to resort to "magic".  As you say, it's your choice, but you came here
> looking for advice and the advice you were given is very good
>
> --
> Peter Hunsberger

Hi Peter:

It wasn't, as the original poster pointed out, 'advice' that was given
so much as personal preference. Had someone said, 'ah, but you see
storing your 15 meanings in decimal uses up 5 bytes, whereas a
bitstring only requires 15 bits, and over 10,000,000 records, the
saving of 3 bytes per record adds up...', then that would be technical
advice I could listen to, assess and make a call on.

But simply saying "your design is broken... wo!" might well scare
the children, but doesn't really do anything for me, because I know
for a certainty that it's not broken at all.

It comes down to this: I can do Boyce-Codd normal form in my sleep
(...and falling asleep happens quite frequent when doing it, strangely
enough), and have been doing so since 1987. I'm certainly not perfect,
but I reckon I can tell from a mile away when one of my designs is
"broken", as you put it -and this one isn't. I haven't even begun to
describe a scintilla of a percentage point of the design decisions
this thing has to deal with, nor the fact that it's been running quite
happily in this manner for a good couple of years... so you'll just
have to take it from me that there's a room-full of users who can look
at code '4097' and know precisely what it means and would be mortified
if I suddenly started displaying exactly the same meanings in what, to
them, would look like utter gibberish.

Unless you, or someone else, can come up with some hard, *technical*
facts as to why working with bitstring encodings of meaning is so much
better than working in decimal, we're sticking with the decimal
representation. I'll buy "you're forever doing implicit casts which
are poor performers" or "implicit casts might break in a future
release" or "it's costing you three bytes per record" ...or anything
else in that vein. But matters of transparency and ease of maintenance
are entirely subjective things (about which I sought no advice at all,
incidentally), and what works for you on those scores doesn't work for
me.

Regards
HJR

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] optimizing daily data storage in Pg

2010-07-22 Thread Andy Colson

On 7/22/2010 9:41 AM, P Kishor wrote:

I have been struggling with this for a while now, have even gone down
a few paths but struck out, so I turn now to the community for ideas.
First, the problem: Store six daily variables for ~ 25 years for cells
in a grid.

  * Number of vars = 6
  * Number of cells ~ 13 million
  * Number of days ~ 9125 (25 * 365)

Optimize the store for two different kinds of queries:

Query one: Retrieve the value of a single var for all or a portion of
the cells for a single day. This is analogous to an image where every
pixel is the value of a single var.

>  SELECT  FROM d WHERE yr = ? AND yday = ?;
>  SELECT  FROM d WHERE yr = ? AND yday = ? AND cell_id IN 
(?,?,?...);

Query two: Retrieve values for all the days or a duration of days for
a single var for a single cell. This is like grabbing a column out of
a table in which each row holds all the vars for a single day.

>  SELECT  FROM d WHERE cell_id = ?;
>  SELECT  FROM d WHERE cell_id IN (?,?,?...);

First, I must admit to not reading your entire email.

Second, Query 1 should be fast, regardless of how you layout the tables.

Third, Query 2 will return 13M rows?  I dont think it matters how you 
layout the tables, returning 13M rows is always going to be slow.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] psql problem

2010-07-22 Thread shakahsha...@gmail.com

On Jul 22, 5:03 pm, Gary Fu  wrote:
> Hi,
>
> System information:
> - psql 8.4.4 on a client with CentOS 5.5 (64 bits)
> - postgres 8.4.4 on the server with CentOS 5.5 (64 bits)
> - the client is connected with vpn
>
> I have a script to create a table with some comments in front.  When I
> use the command 'psql -f script.sql' to load it, it hangs.  However, if
> I remove the comments, OR remove some of the columns from the table, it
> works okay.   It looks like to me, the psql will hang with large size of
> the script file.  I tried 'psql < script.sql' and 'cat script.sql |
> psql' with the same result.
>
> However, I tried it on another client host (CentOS 5.5 32 bits), I don't
> see this problem.
>
> Any idea and suggestion ?
>
> Thanks,
> Gary

Are you sure it is hanging? Maybe piping into psql via "pv" (http://
www.ivarch.com/programs/pv.shtml) could give some insight as to where
it might be hanging, or if it is just going slowly, or whatever.

E.g., if your script is 240500 bytes long, the following will give a
simple progress meter, elapsed time, and ETA display:
  cat yourscript.file | pv -t -r -e -b -s 240500 | psql

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

On Thu, Jul 22, 2010 at 3:09 PM, Howard Rogers  wrote:
> On Fri, Jul 23, 2010 at 12:35 AM, Peter Hunsberger
>  wrote:
>> On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers  wrote:
>>>
>>> On Thu, Jul 22, 2010 at 1:02 PM, Scott Marlowe  
>>> wrote:
>>>
>>>
>>> >>
>>> >> Why on Earth would I want to store this sort of stuff in a bit string?!
>>> >
>>> > Because you are manipulating bits and not integers?  I guess there are
>>> > 10 kinds of people, those who like think in binary and those who
>>> > don't.
>>> >
>>> >> I don't know about you, but I find looking at 21205 a darn'd site
>>> >> easier than staring blankly at 101001011010101!!
>>
>>  lots of stuff
>>
>>> >
>>> > Note you can cast integer to bitstring, but there may be some odd
>>> > behaviour for sign bits and such.  Which is again why I'd use the
>>> > right type for the job, bit string.  But it's your project.
>>> >
>>>
>>> Quoting...
>>>
>>> > Because you are manipulating bits and not integers?  I guess there are
>>> > 10 kinds of people, those who like think in binary and those who
>>> > don't.
>>>
>>> Er, no. 21205 is not an integer. It's an encoded bit of magic.
>>>
>>
>> In that case your database design is fundamentally broken.  A database
>> should have content fields that map to the needs of the application.
>> As you describe your application requirements, that is a bit string
>> and not an integer.  Use bit strings and your application logic is
>> transparent, obvious and easy to maintain.  Use integers and you have
>> to resort to "magic".  As you say, it's your choice, but you came here
>> looking for advice and the advice you were given is very good
>>
>> --
>> Peter Hunsberger
>
> Hi Peter:
>
> It wasn't, as the original poster pointed out, 'advice' that was given
> so much as personal preference. Had someone said, 'ah, but you see
> storing your 15 meanings in decimal uses up 5 bytes, whereas a
> bitstring only requires 15 bits, and over 10,000,000 records, the
> saving of 3 bytes per record adds up...', then that would be technical
> advice I could listen to, assess and make a call on.

You do realize the first page I linked to told you that, right?  It's
not a particularly big page.  I had made the erroneous assumption
you'd read the link I posted.

> But simply saying "your design is broken... wo!" might well scare
> the children, but doesn't really do anything for me, because I know
> for a certainty that it's not broken at all.

I asked if there was a reason you were avoiding bit strings.  Hardly a
"your design is broken" point.  You've now said why you are not using
the type that was designed to handle bit strings for bit strings.

I personally would store them as bit strings and change representation
for users.  There are some issues that come up if your bit strings are
long enough to get close to the last bit in an integer (also mentioned
on the links I posted that didn't get read).  But other than that it
should work fine.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question about SCO openserver and postgres...

On Thu, Jul 22, 2010 at 3:24 PM, Edmundo Robles L.
 wrote:
> Hi!
>  I have a problem with the  max  postgres connections  on SCO
> Openserver 5.0.7, so ...my boss decided to buy  the SCO Openserver 6.0
> but this   version comes in 2  editions:  Starter and Enterprise.
>
> If SCO 5.0.7 only allows 95 ( -3  used by superuser)  connections to
> postgres...
>
> Do you know  how many connections to postgres  can i have with
> OpenServer   in Starter Edition or Enterprise edition?

Are you sure this isn't just a limit in max_connections in postgresql.conf?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Changing pg_attribute.attislocal

2010-07-22 Thread Noah Misch

Over time, I mistakenly did something like this:

CREATE TABLE a ();
CREATE TABLE b () INHERITS(a);
ALTER TABLE b ADD col int;
ALTER TABLE a ADD col int;

where I should have left out the third statement.  Not a great loss, the only
consequence I've observed being pg_attribute.attislocal = true, so dropping
a.col will not drop b.col.  Is there a DDL way to change that, short of dropping
the column from both tables and re-adding it to the parent alone?  If not,
what's the danger of updating attislocal directly?  If it makes any difference,
I don't actually plan to drop the column anytime soon.

Thanks,
nm

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] psql problem

2010-07-22 Thread Andy Colson


On 07/22/2010 04:03 PM, Gary Fu wrote:

Hi,

System information:
- psql 8.4.4 on a client with CentOS 5.5 (64 bits)
- postgres 8.4.4 on the server with CentOS 5.5 (64 bits)
- the client is connected with vpn

I have a script to create a table with some comments in front. When I use the 
command 'psql -f script.sql' to load it, it hangs. However, if I remove the 
comments, OR remove some of the columns from the table, it works okay. It looks 
like to me, the psql will hang with large size of the script file. I tried 'psql 
< script.sql' and 'cat script.sql | psql' with the same result.

However, I tried it on another client host (CentOS 5.5 32 bits), I don't see 
this problem.

Any idea and suggestion ?

Thanks,
Gary


Line endings?

How about a sample?

What comment style: -- /* (* # ; ' //


-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

On Thu, Jul 22, 2010 at 10:27 PM, Howard Rogers  wrote:
> On Fri, Jul 23, 2010 at 8:37 AM, Scott Marlowe  
> wrote:
>> On Thu, Jul 22, 2010 at 3:09 PM, Howard Rogers  wrote:
>>> On Fri, Jul 23, 2010 at 12:35 AM, Peter Hunsberger
>>>  wrote:
 On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers  wrote:
>
> On Thu, Jul 22, 2010 at 1:02 PM, Scott Marlowe  
> wrote:
>
>
> >>
> >> Why on Earth would I want to store this sort of stuff in a bit string?!
> >
> > Because you are manipulating bits and not integers?  I guess there are
> > 10 kinds of people, those who like think in binary and those who
> > don't.
> >
> >> I don't know about you, but I find looking at 21205 a darn'd site
> >> easier than staring blankly at 101001011010101!!

  lots of stuff

> >
> > Note you can cast integer to bitstring, but there may be some odd
> > behaviour for sign bits and such.  Which is again why I'd use the
> > right type for the job, bit string.  But it's your project.
> >
>
> Quoting...
>
> > Because you are manipulating bits and not integers?  I guess there are
> > 10 kinds of people, those who like think in binary and those who
> > don't.
>
> Er, no. 21205 is not an integer. It's an encoded bit of magic.
>

 In that case your database design is fundamentally broken.  A database
 should have content fields that map to the needs of the application.
 As you describe your application requirements, that is a bit string
 and not an integer.  Use bit strings and your application logic is
 transparent, obvious and easy to maintain.  Use integers and you have
 to resort to "magic".  As you say, it's your choice, but you came here
 looking for advice and the advice you were given is very good

 --
 Peter Hunsberger
>>>
>>> Hi Peter:
>>>
>>> It wasn't, as the original poster pointed out, 'advice' that was given
>>> so much as personal preference. Had someone said, 'ah, but you see
>>> storing your 15 meanings in decimal uses up 5 bytes, whereas a
>>> bitstring only requires 15 bits, and over 10,000,000 records, the
>>> saving of 3 bytes per record adds up...', then that would be technical
>>> advice I could listen to, assess and make a call on.
>>
>> You do realize the first page I linked to told you that, right?  It's
>> not a particularly big page.  I had made the erroneous assumption
>> you'd read the link I posted.
>
> If you mean, did I read the bit in the doco where it said nothing at
> all in the 'these are great advantages' style I've just described, but
> instead makes the fairly obvious point that a bit string takes 8 bits
> to store a group of 8 bits (well, stone me!!)

Wow, I'm surprised you get any help with your attitude.  I posted a
link and asked a question and right up front got my head handed to me.

To quote:  "Why on Earth would I want to store this sort of stuff in a
bit string?!

I don't know about you, but I find looking at 21205 a darn'd site
easier than staring blankly at 101001011010101!!"

Like I'd somehow bitten your hand when I asked my question.

> PLUS has extra overhead,
> then yes, I did read that part of your first link... and nevertheless
> concluded that, overall, there is... er, some extra overhead in
> storing bitstrings.

Well, your initial answer certainly didn't give ANY idea that you'd
read that page.

> So what precisely about that first article, which I did indeed read,
> would you have expected to lead me to the conclusion that I'd SAVE
> significant amounts of space or find some other technically-compelling
> reason for switching?

I didn't expect such.  I asked why you weren't using them, and gave
you some links to read on it.  It clearly states that bit strings use
a bit per bit, plus some overhead.  Now, I had no idea if you were
dealing with bigints and 60 bit strings or 5 bit strings.  In fact,
you did little to really describe your project and preferences in your
post.  Which is why my response was short and concise, I had little to
go on.

> My point is that there's nothing much in it, storage-wise, either way.

Well, there is the fact that bit strings can restrict the size of the
entry so you don't accidentally get an int stored that's got more bits
than your model can handle.  There's also the issue that if / when you
ever get close to the last bit in an int bitstring may behave oddly
because of sign issues.

> So there's no compelling technical reason to switch.

I never said there was.  I simply asked a question, and got my hand bitten.

> And without a
> technically-compelling reason, the rest of the post I was referring to
> simply boiled down, as far as I could tell, to a matter of personal
> preference. No less valid for that, of course. But ultimately, not
> something that would hold much sway with me.

Sure, fine, whatever you want.  I wasn't trying to convince you either
way.  I do think using the righ

Re: [GENERAL] Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime

2010-07-22 Thread A. Kretschmer

In response to Piotr Gasid??o :
> Hello,
> 
> I have strange problem.
> 
> I test/optimize my queries with EXPLAIN ANALYZE. I get for example:
> 
> Total runtime: 40.794 ms
> 
> But when I run query without EXPLAIN ANALYZE i get, for example:
> 
> Time: 539.252 ms
> 
> Query returns 33 rows. Why?

Maybe cheaply or virtuell hardware? There are some issues with functions
like gettimoofday(), see here:

http://archives.postgresql.org/pgsql-general/2007-01/msg01653.php
(and the whole thread)

Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bitmask trickiness

2010-07-22 Thread Howard Rogers

On Fri, Jul 23, 2010 at 8:37 AM, Scott Marlowe  wrote:
> On Thu, Jul 22, 2010 at 3:09 PM, Howard Rogers  wrote:
>> On Fri, Jul 23, 2010 at 12:35 AM, Peter Hunsberger
>>  wrote:
>>> On Wed, Jul 21, 2010 at 11:13 PM, Howard Rogers  wrote:

 On Thu, Jul 22, 2010 at 1:02 PM, Scott Marlowe  
 wrote:

 >>
 >> Why on Earth would I want to store this sort of stuff in a bit string?!
 >
 > Because you are manipulating bits and not integers?  I guess there are
 > 10 kinds of people, those who like think in binary and those who
 > don't.
 >
 >> I don't know about you, but I find looking at 21205 a darn'd site
 >> easier than staring blankly at 101001011010101!!
>>>
>>>  lots of stuff
>>>
 >
 > Note you can cast integer to bitstring, but there may be some odd
 > behaviour for sign bits and such.  Which is again why I'd use the
 > right type for the job, bit string.  But it's your project.
 >

 Quoting...

 > Because you are manipulating bits and not integers?  I guess there are
 > 10 kinds of people, those who like think in binary and those who
 > don't.

 Er, no. 21205 is not an integer. It's an encoded bit of magic.

>>>
>>> In that case your database design is fundamentally broken.  A database
>>> should have content fields that map to the needs of the application.
>>> As you describe your application requirements, that is a bit string
>>> and not an integer.  Use bit strings and your application logic is
>>> transparent, obvious and easy to maintain.  Use integers and you have
>>> to resort to "magic".  As you say, it's your choice, but you came here
>>> looking for advice and the advice you were given is very good
>>>
>>> --
>>> Peter Hunsberger
>>
>> Hi Peter:
>>
>> It wasn't, as the original poster pointed out, 'advice' that was given
>> so much as personal preference. Had someone said, 'ah, but you see
>> storing your 15 meanings in decimal uses up 5 bytes, whereas a
>> bitstring only requires 15 bits, and over 10,000,000 records, the
>> saving of 3 bytes per record adds up...', then that would be technical
>> advice I could listen to, assess and make a call on.
>
> You do realize the first page I linked to told you that, right?  It's
> not a particularly big page.  I had made the erroneous assumption
> you'd read the link I posted.

If you mean, did I read the bit in the doco where it said nothing at
all in the 'these are great advantages' style I've just described, but
instead makes the fairly obvious point that a bit string takes 8 bits
to store a group of 8 bits (well, stone me!!) PLUS has extra overhead,
then yes, I did read that part of your first link... and nevertheless
concluded that, overall, there is... er, some extra overhead in
storing bitstrings.

So what precisely about that first article, which I did indeed read,
would you have expected to lead me to the conclusion that I'd SAVE
significant amounts of space or find some other technically-compelling
reason for switching?

My point is that there's nothing much in it, storage-wise, either way.
So there's no compelling technical reason to switch. And without a
technically-compelling reason, the rest of the post I was referring to
simply boiled down, as far as I could tell, to a matter of personal
preference. No less valid for that, of course. But ultimately, not
something that would hold much sway with me.

>> But simply saying "your design is broken... wo!" might well scare
>> the children, but doesn't really do anything for me, because I know
>> for a certainty that it's not broken at all.
>
> I asked if there was a reason you were avoiding bit strings.  Hardly a
> "your design is broken" point.

I'm getting a bit fed up of this thread now. It wasn't YOU that ever
said 'the design is broken', and I never suggested it was. That was
Peter Hunsberger, about three posts up in the thread, who wrote "In
that case your database design is fundamentally broken."

If you're going to take umbrage at something, please take umbrage at
things that were actually directed at you in the first place!

> You've now said why you are not using
> the type that was designed to handle bit strings for bit strings.
>
> I personally would store them as bit strings and change representation
> for users.

I'm a user, too. I get to see this stuff every time I do a select
statement. At the command line. Which I use a lot.

> There are some issues that come up if your bit strings are
> long enough to get close to the last bit in an integer (also mentioned
> on the links I posted that didn't get read).

Don't make false assumptions about other people, please. You don't
know what I read or didn't read. Just because you didn't make a
compelling technical argument in favour of bitstrings doesn't mean I
didn't read the article you linked to ...that also didn't make a
compelling technical argument in favour of bitstrings.

> But other than that it
> should work fine

[GENERAL] Information Extract

2010-07-22 Thread std pik

I'd like to know how can we get the following information in
PostgreSQL 8.4:
Execution plan
The I/O physical reads and logical reads, CPU consumption, number of
DB block used, and any other information relevant to performance.
Taking into consideration that these information could be extracted
from Oracle by AWR, TKPROF, ...etc.
Thanks.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] optimizing daily data storage in Pg

2010-07-22 Thread P Kishor

On Thu, Jul 22, 2010 at 4:56 PM, Andy Colson  wrote:
> On 7/22/2010 9:41 AM, P Kishor wrote:
>>
>> I have been struggling with this for a while now, have even gone down
>> a few paths but struck out, so I turn now to the community for ideas.
>> First, the problem: Store six daily variables for ~ 25 years for cells
>> in a grid.
>>
>>  * Number of vars = 6
>>  * Number of cells ~ 13 million
>>  * Number of days ~ 9125 (25 * 365)
>>
>> Optimize the store for two different kinds of queries:
>>
>> Query one: Retrieve the value of a single var for all or a portion of
>> the cells for a single day. This is analogous to an image where every
>> pixel is the value of a single var.
>
>>      SELECT  FROM d WHERE yr = ? AND yday = ?;
>>      SELECT  FROM d WHERE yr = ? AND yday = ? AND cell_id IN
>> (?,?,?...);
>
>
>>
>> Query two: Retrieve values for all the days or a duration of days for
>> a single var for a single cell. This is like grabbing a column out of
>> a table in which each row holds all the vars for a single day.
>
>>      SELECT  FROM d WHERE cell_id = ?;
>>      SELECT  FROM d WHERE cell_id IN (?,?,?...);
>
>
>
> First, I must admit to not reading your entire email.

I am not sure how to respond to your feedback give that you haven't
read the entire email. Nevertheless, thanks for writing...

>
> Second, Query 1 should be fast, regardless of how you layout the tables.

It is not fast. Right now I have data for about 250,000 cells loaded.
That comes to circa 92 million rows per year. Performance is pretty
sucky.


>
> Third, Query 2 will return 13M rows?  I dont think it matters how you layout
> the tables, returning 13M rows is always going to be slow.
>

Yes, I understand that. In reality I will never get 13 M rows. For
display purposes, I will probably get around 10,000 rows to 50,000
rows. When more rows are needed, it will be to feed a model, so that
can be offline (without an impatient human being waiting on the other
end).

Right now, my main problem is that I have either too many rows (~4 B
rows) in a manageable number of tables (25 tables) or manageable
number of rows (~13 M rows) in too many tables (~9000 tables).


>
> -Andy
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
---
Assertions are politics; backing up assertions with evidence is science
===

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] optimizing daily data storage in Pg