Re: [GENERAL] a failover scenario

2007-10-18 Thread Tomi N/A
2007/10/18, Bill Moran <[EMAIL PROTECTED]>:
> I don't know of any system that will just hand you those capabilities.
>
> Every multi-master system I've ever heard of requires high-speed links
> between the masters, otherwise the synchronization is far too slow to
> be usable.

I supposed so. However, I have the advantage of not having a massive
amount of data change anywhere at once...transactions are
human-generated (with as little as 20-30 users) and typically affect
just a handful of records at once. This means that an hour can pass
between two transactions, but at peak times, a transaction every
second or two might not be unusual.

> I believe you could do what you want in the application.  PostgreSQL
> 8.3 will have a native UUID type, which will help with managing conflicts
> between multiple masters.  If you can define clear rules on how to manage
> conflicts, that can be done automatically.  If the rules aren't so clear,
> you'll need an interface where a human can manage conflicts.

I don't intend to let conflicts occur...ever. Not in the sense that
two databases allocate the same unique resource for different
purposes.

> With triggers and LISTEN/NOTIFY, you can put together an app that
> handles replicating data when tables experience changes.  From there,
> you'll need to structure your schema so such an app can detect conflicts,
> (create "last_updated" timestamps on all tables, and ensure that primary

I was thinking something along those lines. It might make things
easier if nothing was ever erased, only invalidated.

> keys include a UUID or other mechanism to guarantee uniqueness) and design
> some sort of queue mechanism to ensure updates can wait while network
> problems are resolved.

UUIDs didn't come to mind, thanks for pointing them out.

> How much effort such a thing requires is dependent on how complex the
> data is.  If it's a sales database (for example) it's not all that hard,
> since there aren't typical cases where two people are simultaneously
> updating the same record.

Well...it's a sales database...and the risk of simultaneous updates is huge. :)

> I know, for example, that the PA gaming commission is putting something
> like this together for the race tracks.  Each track has handheld devices
> that are used to record bets/payouts, etc.  These devices can't be
> connected all the time, but a sync system is pretty easy because all they
> ever do is _add_ new records.  Thus, you assign each handheld a unique
> device ID, and that's part of the primary key for each table, so there's
> no chance of of conflict.

Sounds pretty simple...probably the tip of the iceberg. :)

> Sounds like a fun and challenging project.  I'm jealous.

It's not a project yet, but the tender is out...all we've got to do is
win it. It's a shame the tender is fixed (for someone else), but
blowing the whistle might help.

Thanks for sharing. Cheers,
t.n.a.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] a failover scenario

2007-10-18 Thread Tomi N/A
I am considering pgsql as the RDBMS in a project with the following constraints:
- there's a master and reserve instance of the RDBMS on every remote location
- there's a master and reserve instance of the RDBMS on a central location
- the connections are DSL connections and therefore unreliable
- all changes have to be propagated to all servers (multimaster replication)
- if the connection between a remote location and the central location
fails, the local server continues working and resynchronizes with the
central server when the connection is restored
- if any master fails, the reserve instance takes over and the rest of
the system acts as though nothing happened

The master/reserve instance is, from what I read, standard
functionality, but I'm not so sure about the resynchronization part of
a failed link...I imagine something like WAL shipping might be of use
here, but it's just an uneducated guess.
Does code exist to support this on pgsql or is it considered
application specific functionality? Do other RDBMSs support similar
functionality?

TIA,
t.n.a.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] pg_dump - schema diff compatibility

2007-10-16 Thread Tomi N/A
2007/10/16, Sualeh Fatehi <[EMAIL PROTECTED]>:
> SchemaCrawler for PostgreSQL will allow you to do the diffs. With
> SchemaCrawler for PostgreSQL, you can take  human-readable snapshots
> of the schema and data, for later comparison. SchemaCrawler outputs
> details of your schema (tables, views, procedures, and more) in a diff-
> able plain-text format (text, CSV, or XHTML). You can order data
> alphabetically, or by ordinal order. SchemaCrawler can also output
> data (including CLOBs and BLOBs) in the same plain-text formats.
>
> SchemaCrawler for PostgreSQL is free and open-source (under the GPL),
> and is available at SourceForge: http://schemacrawler.sourceforge.net/

Sounds promising. Thanks for the tip, I'll be sure to try it out.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[GENERAL] pg_dump - schema diff compatibility

2007-10-12 Thread Tomi N/A
Looking at the mailing list archive, this is just one in a rather long
line of questions regarding diffing db schema dumps, but I've been
unable to find what I was looking for in any of the prior
conversations. I know of apgdiff (seems to work very nicely) and of
other specialized pg diff tools (as outdated or proprietary as they
may be), but what I'm interested in is just a plain, basic schema dump
with a database object order usable with diff.
I can't find it now, but I'm fairly certain I've read somewhere (in
the release changes of an 8.x pgsql version?) that pg_dump has been
"upgraded" so that it orders database objects fist by their
dependencies and then by name. I thought that would imply that dumping
the database like so
pg_dump -f out.sql -F p -s a_db
would give me an sql script which I could compare versions of with
plain old diff or svn diff or whatever existing diff tool I care to
use.

I guess my question is: is pg_dump supposed to dump the schema in a
diff-compatible, predictable way but it's not working or is pg_dump
only concerned with satisfying db object dependencies?
I would very much like this functionality because it would make pgsql
much better integrated into the work environment we have setup at the
office (using e.g. svn diff would be very nice). Tools like apgdiff
don't help as much: it great that it's command line (can be
automated), it does it job well, but it sitll only tells me e.g. that
a view is different, rather than showing me _how_ it is different or
allowing me to compare object definitions using a generic diff - which
is what I really want.

Sorry for the confusing trail of thought and thanks for any comments,
t.n.a.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] Using C# to create stored procedures

2007-04-05 Thread Tomi N/A

2007/4/3, Andrus <[EMAIL PROTECTED]>:


I needd to re-write a lot of compliatated SQL select statements to run them
in server which generate reports.
Currently they are running in client side.
Client application uses procedural language  to do additional processing of
data retrieved from PostgreSQL server.

I need to create new client application. So I desided that I must move as
much processing to server as possible.
Using C# should make my procedures portable to MS SQL, DB2 also.

I'm looking for things available in modern language IDEs like
auto-completion, intellisence, immediate syntax check
and modern language features like classes, generic, interfaces.
There a much more source code, resources and  books available in C# than in
pl/pgSQL.


So use pljava. And add refactoring and IDE independence on the
development side and platform independence on the deployment side,
while you're at it.
Or write a service layer and build upon it in whatever you want. Or
write plc# and make it possible for yourself and anyone else to write
postgresql procedures in C#.


C# allows to use whole .NET class library, which contains solutions for
almost everything.


...rrright. ;^)


I think that using pgAdmin+PL/pgSQL decreases my productivity a lot compared
to Visual Studio + C#


I would tend to agree with you there: writing major chunks of logic in
stored procedures (plpgsql, tsql or any other SQLoid language) isn't a
very pleasurable experiance...but it feels like a death row pardon
compared to debugging someone elses SQLoid code.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Postgres Database size

2007-03-17 Thread Tomi N/A

For small and moderate size databases, I find that the simplest way to
estimate the database size is to do:

du -ks /var/lib/postgresql/data
...create a new database, restore a backup of the database in whose
size you're interested in
du -ks /var/lib/postgresql/data

Compare the first and second "du" output and you get possibly the
exact size of the database.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] 30ms to 30s execution time: query optimization problem

2007-03-14 Thread Tomi N/A

I have a problem with query optimization. Since it's a bit hard to
explain, I'll describe the database structure and list the queries
first. The database holds contacts information for a call center. The
general idea is to have operators call a random contact and record
the results. Rules for picking a contact are:

- contacts that have an open call shall not be called
- contacts that have been called and refused to participate in
current poll shall not be called again
- contacts that have been called and did not answer or line was busy
will be called again after 4 hours
- contact shall not be attempted more than max_call_times per poll

Contacts are joined in groups and groups are assigned to poll
samples. Each poll has several groups assigned, and each usergroup
can have different sample groups.

Tables sizes at this time are roughly:
contact - 1.5M
call - 0.5M
group - 10K
poll - 10
contact_group - 5M (each contact belongs to ~ 3 groups)

Tables last_call and num_calls have just the last call to a contact
per poll , and number of calls to a person per poll respectively.

All tables have foreign keys and appropriate indices.

SELECT INTO s_contact contact.id_contact
FROM contact
JOIN contact_group ON contact.id_contact = contact_group.id_contact
JOIN group_poll ON contact_group.id_group = group_poll.id_group
  AND group_poll.id_poll = $$s_poll$$
JOIN group ON group.id_group = group_poll.id_group
JOIN user_role ON group_poll.id_role = user_role.id_role
AND user_role.id_user = $$s_user$$
LEFT OUTER JOIN last_call ON last_call.id_contact = contact.id_contact
LEFT OUTER JOIN num_calls ON num_calls.id_contact =
contact.id_contact AND num_calls.id_poll = $$s_poll$$
WHERE -- temporal limitations
(
 (  group_poll.valid_from IS NULL )
 OR
 ( group_poll.valid_from <= NOW() )
)
AND
(
 (  group_poll.valid_to IS NULL )
 OR
 ( group_poll.valid_to >= NOW() )
)
AND
(
 ( group.valid_from IS NULL )
 OR
( group.valid_from <= NOW() )
)
AND
(
 ( group.valid_to IS NULL )
 OR
( group.valid_to >= NOW() )
)
AND
(
 ( contact.valid_from IS NULL )
 OR
( contact.valid_from <= NOW() )
)
AND
(
 ( contact.valid_to IS NULL )
 OR
( contact.valid_to >= NOW() )
)
AND
( -- no call
last_call.id_call IS NULL
OR
( -- no call in progress and not processed
   NOT(last_call.id_call_status = call_in_progress
   OR (last_call.sif_anketa = s_poll
   AND last_call.id_call_status
IN ( call_completed, call_aborted, will_not_participate ))
)
   AND
last_call.end_time < NOW()-'240 minutes'::interval
 )
)
AND
(
 num_calls.num_calls IS NULL
 OR
 num_calls.num_calls < $$max_num_calls$$
)
AND -- randomize
 contact.random_number <= $$random_key$$
ORDER BY contact.random_number DESC LIMIT 1 FOR UPDATE OF contact;

The randomization is performed with a random number column in
contacts table. Variables with $$ are inputs to functions.

This query has an interesting property that it finishes with lightning
speed, seemingly if there is a lot of contacts that cannot be joined
with last call and num_calls. But it also goes to the other extreme
and take up to 30s to execute.



Limit  (cost=0.00..17.64 rows=1 width=18) (actual time=0.205..0.205
rows=1 loops=1)
->  Nested Loop  (cost=0.00..1615461.74 rows=91598 width=18) (actual
time=0.201..0.201 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..1536111.58 rows=42092 width=22)
(actual time=0.179..0.179 rows=1 loops=1)
->  Nested Loop Left Join  (cost=0.00..1523725.29
rows=42092 width=30) (actual time=0.159..0.159 rows=1 loops=1)
  Filter: ((num_calls.num_calls IS NULL) OR
(num_calls.num_calls < 3))
  ->  Nested Loop  (cost=0.00..1508293.43 rows=48997
width=30) (actual time=0.139..0.139 rows=1 loops=1)
->  Nested Loop  (cost=0.00..1089736.15
rows=1430554 width=22) (actual time=0.120..0.120 rows=1 loops=1)
  ->  Nested Loop Left Join
(cost=0.00..610920.91 rows=635050 width=18) (actual time=0.088..0.088
rows=1 loops=1)
Filter: ((last_call.id_poziv IS
NULL) OR ((last_call.id_call_status <> 1) AND ((last_call.id_poll <>
1) OR (last_call.id_call_status <> ALL ('{3,8,9}'::integer[]))) AND
(last_call.end_time < (now() - '04:00:00'::interval
->  Index Scan Backward using
ix_contact_random_number on contact  (cost=0.00..191996.83
rows=1266570 width=18) (actual time=0.056..0.056 rows=1 loops=1)
  Index Cond: (random_number
<= 0.98::double precision)
  Filter: (((valid_from IS
NULL) OR (valid_from <= now())) AND ((valid_to IS NULL) OR (valid_to

= now(

->  Index Scan using
fki_last_call_2 on last_call  (cost=0.00..0.30 rows=1 width=24)
(actual time=0.023..0.023 rows=0 loops=1)
  Index Cond:
(last_call.id_contact = contact.id_contact)
 

Re: [GENERAL] DB Modeler

2007-03-08 Thread Tomi N/A

2007/3/8, Raymond O'Donnell <[EMAIL PROTECTED]>:

On 08/03/2007 12:32, Hakan Kocaman wrote:

> i work with Clay in Eclipse:
> http://www.azzurri.jp/en/software/clay/

I use Clay also, from time to time. The only downside is that you can't
print from the free version, and the commercial version isn't available
outside Japan.


Ditto.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] perfromance world records

2007-02-25 Thread Tomi N/A

2007/2/25, Tom Lane <[EMAIL PROTECTED]>:


If the objective is to claim a world record, we'd look pretty silly
trying to do so with a nonstandard, non-certified test.  The point
of certification in this context is that you have someone else
attesting to the validity of your results.  Without that, your claim
isn't going to be believed.


Makes sense. I got carried away a bit.
I guess I'll have to stick to the available case studies...maybe even
contribute one or two of those myself.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] perfromance world records

2007-02-25 Thread Tomi N/A

2007/2/24, Joshua D. Drake <[EMAIL PROTECTED]>:

Ron Johnson wrote:
> On 02/24/07 11:00, Tom Lane wrote:
>>> "Tomi N/A" <[EMAIL PROTECTED]> writes:
>>>> ...which made me think: postgresql aims at the same (or very similar)
>>>> clients and use cases as Oracle, DB2 and MSSQL. I pose the question
>>>> from an advocacy standpoint: why doesn't postgresql hold a world
>>>> record of some sort (except performance/price)?
>>> Certified TPC tests are *expensive* to run.  If you search the PG
>>> archives for "TPC" you will probably find some relevant prior
>>> discussions.
>
> What about non-certified tests?
>
> Or has the TPC copyrighted/licensed/whatever the tests, so that you
> can only publish certified results?

You can not publish TPC tests without a TPC fee :). However there are
plenty of other tests such as dbt2 and odbcbench that can give you
comparable and free results.


I mentioned a TPC test as an example: any kind of (well known)
"standard" test would do.
I guess it goes without saying anyone running such a test would do
well to send word to the mailing list with a URL to the results. :)

t.n.a.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] perfromance world records

2007-02-24 Thread Tomi N/A

I just ran into an article about Oracle setting a world record in some
kind of test: 
http://www.oracle.com/corporate/press/2007_feb/TPC-H_300GB_Benchmark_wHP.html?rssid=rss_ocom_pr

...which made me think: postgresql aims at the same (or very similar)
clients and use cases as Oracle, DB2 and MSSQL. I pose the question
from an advocacy standpoint: why doesn't postgresql hold a world
record of some sort (except performance/price)?
Is it because the tests (time, expertise, hardware) are too expensive?
Are the other RDBMSes simply faster? Something else?
I'd like to know, because it'd be a hell of an argument to use when
advocating the use of pgsql on a project: "well, we *could* go with
MSSQL, but it's going to tie us up...when using multiple CPUs
(licences), when deploying a failover solution (licences), when you
want to work with spatial information or something else: but pgsql, on
the other hand...it doesn't have that kind of licencing volatility,
gives you everything it's got and achieves world record performance
doing so..."

That's the kind of leverage I'd like to have when talking about using
pgsql with my colleagues.
Anyone care to comment?

Cheers,
Tomislav

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] can't stop the postmaster?

2007-02-20 Thread Tomi N/A

This is probably a question more appropriate on a gentoo mailing list,
but I'll ask anyway as it obviously has to do with postgresql and I've
a feeling someone will probably know: can anyone explain what is it
that happens here when I try to stop the postmaster? What can I do
about it?

# /etc/init.d/postgresql stop
* Stopping PostgreSQL ...
start-stop-daemon --stop --pidfile /var/lib/postgresql/data/postmaster.pid
--retry -TERM//-INT//-QUIT --oknodo
/sbin/start-stop-daemon: invalid schedule item (must be
[-], -,  or `forever'
Try `/sbin/start-stop-daemon --help' for more information.[ !! ]


TIA,
t.n.a.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] database backup trouble

2007-02-05 Thread Tomi N/A

2007/2/3, George Weaver <[EMAIL PROTECTED]>:


>it's a windows-related problem

Is the Task Scheduler service running? (Start > Settings >  Control Panel >
Administrative Tools > Services - Task Scheduler?).

If the Task Scheduler service is running , what does the Task Scheduler log
indicate about the tasks you tried to run (menu item: Advanced > View Log)?


Thanks for the tip: I'll check the log.
I just did a test on the office W2k3 server and the test was
successful in that the task ran. This leads me to believe it's a
matter of priviledges: our client has much stricter user rights
policies than we do at the office. :)



Did any of your test .bat files contain a PAUSE command to keep the console
window open in case the bat file did run as scheduled?


As I said, it doesn't seem to be related to the script contents (see
above) and I'm kind of surprised all the reactions on the mailing list
seem to be of the windows-scheduled-tasks-don't-work?-really?-type
instead of the pgagent-works-like-a-clock-you-just-have-to-wind-it-up
type of anwer, as I had hoped and expected.


>this is the beanshell script:

Unfortunately I am not familar with beanshell and cannot offer assistance
here.


These are just my very mediocre attempts to get around the fact that I
can't get around pg_dump, i.e. there seems to be no way to issue a
request from a client on the network and get the database dump from
the server: I listed the beanshell approach to a platform independent
backup solution only for completeness sake.

Thanks,
t.n.a.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] database backup trouble

2007-02-02 Thread Tomi N/A

2007/2/2, George Weaver <[EMAIL PROTECTED]>:


And the .bat file script?


The .bat script is of no importance: it's a windows-related problem
(or, to be more precise, the problem of my ignorance when it comes to
windows scripting). The most trivial .bat scripts containing only a
simple echo statement or nothing at all aren't executed by the
scheduler.

The _idea_ was to invoke the pg_dump statement from a script using the
syntax I already described: it shouldn't be the problem. For
completeness, this is the beanshell script:

db_name = "my_db";
out_dir = "/tmp/";

 DALJE SE NE MIJENJA NISTA 
archive_name = db_name;
archive_name = archive_name + "_" + Calendar.getInstance().get(Calendar.YEAR);
archive_name =  archive_name + "-" + Calendar.getInstance().get(Calendar.MONTH);
archive_name =  archive_name + "-" + Calendar.getInstance().get(Calendar.DATE);
archive_name =  archive_name + "_" + Calendar.getInstance().get(Calendar.HOUR);
archive_name =  archive_name + "-" +
Calendar.getInstance().get(Calendar.MINUTE);
archive_name =  archive_name + "-" +
Calendar.getInstance().get(Calendar.SECOND);
archive_name = archive_name + ".backup";
print(archive_name);
command = "pg_dump -h localhost -U my_user_name my_db -f \"" + out_dir
+ archive_name + "\"";
print(command);
exec(command);

t.n.a.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] database backup trouble

2007-02-02 Thread Tomi N/A

2007/2/2, George Weaver <[EMAIL PROTECTED]>:


From: "Tomi N/A"

> At this point, I decided to try a much more primitive approach: using
> a windows scheduled task running a .bat or .cmd script which in turn
> calls pg_dump.

It might be helpful if you provided the exact pg_dump command you are trying
to use, as well as the text of the .bat file you set up.


I did exactly that a bit into my message (should have probably pointed
it out at the beginning).
Anyway, here it is again:
pg_dump -h localhost -U myuser mydatabase -f outputfile.backup

Oh, and I forgot to ask another question in the original message: why
can a linux user with an invalid or non-existant .pgpass successfully
use pg_dumpall (pg_dumpall -U existing_pg_role > dump.sql) and read
any data he/she cares to read?

Thanks,
t.n.a.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] database backup trouble

2007-02-02 Thread Tomi N/A

I'm trying to set up an automatic backup mechanism and have a number
of questions about issues I've encountered.
First of all, I tried to install pgAgent (the server is running win2k3
and pgsql 8.1.5). Trying to start the pgagent service results in an
"error 193", one google knows very little about. Upon inspection, I
found out that Windows has more to say about the problem: "pgAgent is
not a valid Win32 application". Hmm.
At this point, I decided to try a much more primitive approach: using
a windows scheduled task running a .bat or .cmd script which in turn
calls pg_dump. Don't really know why, but the scheduled task can't be
executed successfully (not even when all the script contains is a
simple print statement). This is obviously not a postgresql issue, but
it is one of the solutions I've tried.
Finally, I tried it on my linux laptop (pgsql 8.1.5): I created a test
linux user, created a valid .pgpass file in his home dir with
-rw--- privileges and attempted to back up the database. Doesn't
work. The username and password are valid, but no dump file is
created. I found out that I explicitly have to tell pg_dump the user
name in the .pgpass file using the -U switch: then it works, but only
when called directly from the shell like so:
pg_dump -h localhost -U myuser mydatabase -f outputfile.backup
The same command executed from a beanshell script (I wanted to be
platform independent) does nothing.

Where I stand at the moment:
1.) obviously, the beanshell solution will do nothing for me until I
find out why I can't execute a windows scheduled task, but it might be
a fairly flexible workarround if corrected
2.) again, obviously, I'd much, much rather do the backup from
pgAgent, if anyone can tell me how I can get arround the error Windows
throws at me.

Any suggestions are much appreciated.
Cheers,
t.n.a.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] [Fwd: [PORTS] M$ SQL server DTS package equivalent in Postgres]

2007-01-26 Thread Tomi N/A

Besides being easy to schedule and very flexible, manipulating data
with queries is extremely powerful and fairly easy to maintain
assuming you know a little SQL -- thanks to postgresql's huge array of
built in string manipulation functions.  Your skills learned here will
pay off using the database as well for other things.

Not only that, but this approach will be fast since it is declarative
and handles entire tables at once as opposed to DTS-ish solutions
which tend to do processing record by record.  Not to mention they are
overcomplicated and tend to suck. (DTS does have the ability to read
from any ODBC source which is nice...but that does not apply here).


Different strokes for different folks, it seems.
I'd argue that COPY followed by a barrage of plpgsql statements can't
be used for anything but the most trivial data migration cases (where
it's invaluable) where you have line-organized data input for a
hand-full of tables at most.
In my experience (which is probably very different from anyone
else's), most real world situations include data from a number of very
different sources, ranging from the simplest (.csv and, arguably,
.xml) to the relatively complex (a couple of proprietary databases,
lots of tables, on-the fly row merging, splitting or generating
primary keys, date format problems and general pseudo-structured,
messed up information).
Once you've got your data in your target database (say, pgsql), using
SQL to manipulate the data makes sense, but it is only the _final_
step of an average, real world data transformation.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] [Fwd: [PORTS] M$ SQL server DTS package equivalent in Postgres]

2007-01-26 Thread Tomi N/A

2007/1/23, Paul Lambert <[EMAIL PROTECTED]>:


G'day,

Is there an equivalent in Postgres to the DTS Packages available in M$
SQL server.

I use these in SQL server to pre-load data from CSV files prior to
enabling replication from my primary application. Any pointers on where
best to go for this would be appreciated. I'm reading about something
called EMS, is that the way to go?

Sample of one of the files I use:

DEALER_ID^DATE_CHANGED^TIME_CHANGED^BILLING_CODE_ID^DES^BILLING_CODE_TYPE^LABOUR_RATE^LABOUR_SALES_GROUP^CUSTOMER_NO^PARTS_SALES_GRO
f UP^COMEBACK^WORKSHOP^FRANCHISE^LOCATION^DELETEFLAG
F65^23-Jan-2007^10:13^AA^ADVERSITING ADMIN^I^45^40^2196^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AN^ADV NEW^I^45^40^1636^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AP^ADV PARTS^I^45^40^1919^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AS^ADV SERV^I^45^40^2057^18^^0^BLANK^0^
F65^23-Jan-2007^10:13^AU^ADV USED^I^45^40^1775^18^^0^BLANK^0^N




Paul,
what you're looking for exists in a number of variations.
You can use a good text editor with the postgres' COPY command for simple
bulk .csv loading, but in the long run, you might want to consider a
full-fledged 
ETLtool
like the Pentaho Data Integration suite.  It allows you to use a wide
variety of data sources (anything with a jdbc driver, .xls, .csv, .xml for
starters), chain data transformations, store them and execute them on a
regular basis or on-demand from a java app etc. A number of such tools are
available, but I suggested Pentaho Data Integration as I have experience
with it and because it is under an open source licence. If anyone can
suggest a better tool for the job, please say so: I for one would like to
have my horizons broadened. :)

Cheers,
t.n.a.


Re: [GENERAL] Is PostgreSQL for this?

2006-12-27 Thread Tomi N/A

2006/12/27, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:

Hi everybody,
I'm looking for a database system for a SCADA system. The major
probles I think it's on performance because the application it's going
to poll about 4k variables per second from hardware and has to register
the values on the procces table.
I heard that PostgreSQL provides a bulk loading mechanism called COPY,
which takes tab-delimited or CSV input from a file. Where COPY can be
used instead of hundreds or thousands of INSERTS, it can cut execution
time.
I'm less than a novice so I'll thank any piece of advice.


I believe you could easily simulate the load in a small
fake-SCADA-program and see how the hardware at your disposal handles
it with postgresql, a different RDBMS or simply a flat file. Make a
small program which will generate a set of 4k random values and send
them asynchronously over the network to your data acquisition
application which should store the data in the database. Measure how
fast you can send the data and still record everything.

If data acquisition speed is your primary concern (as it seems to be),
you might want to use a simple .csv file: you'll probably beat the
performance of any database management system. You could periodically
move the saved data from the .csv files into a database (say,
postgresql) where you could (I assume) analyze it. You might want to
use a separate machine for the database management system so as to
remove any unnecessary CPU and I/O disturbances from the primary data
storage machine.
I don't think your load (32 kBps if your variables are double
precision float values) is a challenge, but running any kind of
analysis on a basically real-time-response-constrained machine might
cost you data losses and I don't know if you can afford those.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] permission in the db or in the application?

2006-12-18 Thread Tomi N/A

Bill makes several valid points, but in spite of them, the app I'm
writing has almost no logic in the database.
Why? Well, mostly because it would be too much trouble to remove all
of it. :) No, seriously...

Lack of a good language. Postgresql understands a growing number of
languages and that's great, but most business applications (the
dominant type of application when discussing security issues) are
written in an object oriented language like Java or C# (please don't
be offended if I've missed anyone's favourite over-10%-of-the-market
OO language).
pljava is becoming an alternative, but it's still kind of early to
tell if VM-executed languages behave well in a database environment.
It's hard to wri...maintain, maintain anything non-trivial written in
plpgsql: I face that exact problem right now with a couple of fairly
complex import scripts.

Lack of a good IDE. I've yet to hear how I can debug a plpgsql
script/function. If it's just a matter of ignorance, someone please
enlighten me.

Code version control. Yes, you can dump a pgsql database schema
without the actual data and store it in a repository somewhere, but a)
you basically version-control a single large file and b) at least some
other manistream RDBMSes (MSSQL, for one) can't make that kind of dump
(easily), so the potential user base is much reduced and best
(workable?) patterns and practices haven't been established.


On the other hand, writing application logic (which includes security)
in the application isn't at all that bad. Modular design really helps
there: containing the logic in a package/library/module/whatever goes
a long way to allow trivial or very easy reuse in a second or third
app.
And if what you want is a truly heterogeneous application ecosystem
built around the same application logic, you can always go the SOA and
WS way: "hey, you want to code a GUI in RoR? Here you go..." At that
point, you can basically extend your system which ever way you want.


...or at least so I've heard during my knitting class. :)
t.n.a.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] grant select on all tables of schema or database

2006-12-13 Thread Tomi N/A

2006/12/13, Tom Lane <[EMAIL PROTECTED]>:

Martijn van Oosterhout  writes:
> I don't beleive you have to explicitly grant access to the database, or
> the schema, but you definitly have to grant access to the tables
> directly.

They're completely separate privileges.  GRANT ON DATABASE grants or
revokes permissions associated with database-level operations, not
permissions on specific objects contained in the database.  Likewise
for GRANT ON SCHEMA.

What the OP seems to be wishing for is a wild-card grant operation,
viz

GRANT ALL ON TABLE *.* TO joeblow

which would indeed be a useful thing to have, but it's *not* GRANT ON
DATABASE.


Exactly.
Thank you Martijn and Tom for the help and clarification.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] grant select on all tables of schema or database

2006-12-13 Thread Tomi N/A

2006/9/28, Najib Abi Fadel <[EMAIL PROTECTED]>:

when u connect to the database type:
\h GRANT
and you will get all the Grant options:

GRANT { { CREATE | TEMPORARY | TEMP } [,...] | ALL [ PRIVILEGES ] }
ON DATABASE dbname [, ...]
TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ]

This will grant the privileges  on all tables under the database 

HTH





Gene <[EMAIL PROTECTED]> wrote:
 It seems like it should be a very easy problem to solve I just need one
role to have select privileges on all the tables of a particular schema or
database including any new tables that are created since they are created
programmatically daily. I've combed google and the docs to no avail. Do I
need to write pg/plsql scripts just to so something like that?? I believe on
other dbms you can just say grant all on schema.* or something to that
effect. The script i found below would be ok if tables weren't being created
constantly.

using version 8.1.4

thanks,
Gene


CREATE OR REPLACE FUNCTION pgx_grant(text, text, text)
  RETURNS int4 AS
$BODY$
DECLARE
  priv ALIAS FOR $1;
  patt ALIAS FOR $2;
  user ALIAS FOR $3;
  obj  record;
  num  integer;
BEGIN
  num:=0;
  FOR obj IN SELECT relname FROM pg_class
  WHERE relname LIKE patt || '%' AND relkind in ('r','v','S') LOOP
EXECUTE 'GRANT ' || priv || ' ON ' || obj.relname || ' TO ' || user;
num := num + 1;
  END LOOP;
  RETURN num;
END;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION pgx_grant(text, text, text) OWNER TO root;


I don't get it. I grant all privileges on a database to a role, but
the server won't let it access the schemas. I grant all privileges on
the schema to the same role, but the server won't let it access the
relations in the schema.

GRANT ALL ON DATABASE testdb TO myuser
GRANT ALL ON SCHEMA testschema TO myuser;

Any idea what I'm doing wrong?

TIA,
t.n.a.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] a question on SQL

2006-12-12 Thread Tomi N/A

Ragnar, Marc, thanks so much for the help: DISTINCT ON was *exactly*
what I needed.
It's not a part of any SQL standard I know of, but does the job _wonderfully_.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] a question on SQL

2006-12-12 Thread Tomi N/A

Don't really know where to ask this...the general mailing list sounds
like the closest.

Let's say I have three tables: owner, factory and product with a 1:N
relationship at each step.
Assuming that a product has a production date, how would you go about
returning a factory for every owner, where the returned factory is the
factory that produced the oldest product of it's owner?

I'm perplexed by the simplicity of the task and the fact that what
solutions I did come up with fail to perform very well (a subselect
with ORDER BY MIN(production_date) LIMIT 1) or are rather complex
(using temp tables).

Hints?

TIA,
t.n.a.

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] server speed question

2006-12-12 Thread Tomi N/A

I'm trying to optimize performance on my development laptop, one of
the main bottlenecks beeing a 4200 rpm disk.
It's a fairly good machine (Pentium M, 1,73GHz, 1GB RAM), but pg
doesn't seem to use the processing power: the disk works all of the
time.

I'm working with a database with a couple of hundred thousands of rows
and a couple of redundant tables with dozens of columns.
The operations that takes ages are cache updates. I'd like pg to use
more memory and CPU and work as much as possible with no disk writes:
data integrity isn't a very high priority here.

Aside from setting shared_buffers to 30200 (close to 250 MB), fsync to
off and autovacuum to on, and setting a bigger max shared memory space
(echo 25600 > /proc/sys/kernel/shmmax)  I'm lost as to what else
to do.
My disk doesn't stop working when I update several thousand records in
the redundant table and it's kind of surprising, given that the entire
database could (theoretically) fit into the 250 MB shared memory
space.

Any hints?
Cheers,
t.n.a.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] contrib/fuzzystrmatch search_path: what does it mean?

2006-12-09 Thread Tomi N/A

2006/12/9, Martijn van Oosterhout :

On Sat, Dec 09, 2006 at 12:08:08PM +, Tomi N/A wrote:
> A variable called search_path is set to public at the begining of the
> fuzzystrmatch script.
> Can anyone tell me what it does?

Check the documentation for the details, but it's basically:

- The list of schemas to look in when given a table name
- The first one is the schema objects are created in by default


I would have, but I don't know where the documentation is. The README
in /usr/share/doc/postgresql-x.x.x/contrib doesn't mention it.

Thanks for the prompt reply and the help.
t.n.a.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] contrib/fuzzystrmatch search_path: what does it mean?

2006-12-09 Thread Tomi N/A

A variable called search_path is set to public at the begining of the
fuzzystrmatch script.
Can anyone tell me what it does?

Cheers,
t.n.a.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Speed of postgres compared to ms sql, is this

2006-12-06 Thread Tomi N/A

2006/12/6, Markus Schiltknecht <[EMAIL PROTECTED]>:

Hi,

Tomi N/A wrote:
>> > When the subselect returns a lot of results, pgsql really takes it's
>> time.
>
> 8.1.something

PostgreSQL 8.2 improved a lot for IN clauses with lots of values. I
think it now performs as good as an equal join query.


Thats good to know. I'm looking forward to taking it for a spin...as
soon as I make sure my data is safe.

t.n.a.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Speed of postgres compared to ms sql, is this

2006-12-05 Thread Tomi N/A

2006/12/5, Scott Marlowe <[EMAIL PROTECTED]>:

On Tue, 2006-12-05 at 16:32, Tomi N/A wrote:

> One type of query does come to mind, now that I think about it.
> pgsql has trouble handling queries like
> SELECT * FROM t0 WHERE t0.id_t1 IN (SELECT t1.id FROM t1 WHERE...)

> When the subselect returns a lot of results, pgsql really takes it's time.

Just wondering what version of pgsql you were using, as the in()
performance has been greatly improved in the newer versions.


8.1.something


What I noticed was that PostgreSQL was better under parallel load than
MSSQL server was.  Our pgsql 7.2 server would routinely outrun the MSSQL
server (This was like 3 years ago) when they were both moderately
loaded.  Of course, we didn't run a lot of where in () queries on the
pgsql server, we re-worked them to favor postgresql's query planner of
the time.


We frequently run into the same basic RDBMS benchmarking problem:
basically any database can be fastest in a given context, depending on
how the query is expressed and what it does.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Speed of postgres compared to ms sql, is this

2006-12-05 Thread Tomi N/A

2006/12/5, Tom Lane <[EMAIL PROTECTED]>:


These sorts of reports would be far more helpful if they contained some
specifics.  What queries does MSSQL do better than Postgres, exactly?


You are of course correct, Tom.
I'm sorry I'm not in a position to replay what I've been doing a year
ago...I wish I could.

Obviously, I never had to worry about the concepts of vacuuming and
analysis (not that it's very difficult with pgsql: it just doesn't
exist as a concept with MSSQL).
Anyone calling my comment completely subjective would be completely
correct because that's what it was.

One type of query does come to mind, now that I think about it.
pgsql has trouble handling queries like
SELECT * FROM t0 WHERE t0.id_t1 IN (SELECT t1.id FROM t1 WHERE...)

The performance is a bit better when there's only one result in the
subselect so you can do:
SELECT * FROM t0 WHERE t0.id_t1 = (SELECT t1.id FROM t1 WHERE...)

When the subselect returns a lot of results, pgsql really takes it's time.

The first query, however, can be executed much, much (at least an
order of magnitude) quicker like this:
SELECT * FROM t0
LEFT OUTER JOIN t1 ON t1.id = t0.id_t1
WHERE t1.id IS NOT NULL

I didn't notice this kind of sensitivity with MSSQL, but again, I
can't easily reproduce what I've been doing.

Sorry for the original FUD-like report. Cheers,
t.n.a.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Speed of postgres compared to ms sql, is this

2006-12-05 Thread Tomi N/A

2006/12/4, Ian Harding <[EMAIL PROTECTED]>:

On 11/13/06, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> On Mon, 2006-11-13 at 15:36, novnov wrote:
> > OK, thanks everyone, I gather from the responses that postgres performance
> > won't be an issue for me then. If MS SQL Server and Postgres are in the same
> > ballpark performance-wise, which seems to be the upshot of your comments, no
> > problem. I'd only have worried if there was something like the major
> > difference between the two with more complicated queries. I am puzzled by
> > the commentor's post to the article, it could be FUD of course but didn't
> > particularly sound like the commentor was anti pgsql.
>
> I will say this. Most other databases are more forgiving of bad
> queries.  Make a bad query and postgresql is more likely to punish you
> for it.

Amen.  When I migrated from MSSQL to PostgreSQL (4 years ago), I found
out exactly how seriously MS SQL coddles you when it comes to its "Oh,
I know what you really meant" query planning.  I committed some sins
MS SQL covered up nicely and PostgreSQL flat out crawled when
presented to it.

However, I suspect that if I tried those bad queries with a current
version of PostgreSQL they would run much better, given all the work
that has been put  in over the last few years.


I've seen the exact same behaviour last year with pg 8.1 vs. MS SQL 2k.
It was an unexpected shock, but it's really not that hard to make
pgsql run much faster.
There are simply things which pgsql executes painfully slow if you
don't write them the way the server expects you to. This hasn't
changed in 8.1, but then again, it's not nearly the biggest problem I
have with this specific RDBMS. ;)

t.n.a.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Only MONO/WinForms is a way to go

2006-11-29 Thread Tomi N/A

2006/11/29, Guy Rouillier <[EMAIL PROTECTED]>:

John DeSoi wrote:
> There are supposedly some native Mono toolkits for
> the Mac, but I have yet to see any Mac application that uses it.

I'm aware there is substantial anti-Java bias here, but the Standard
Widget Toolkit (SWT) uses native widgets on all platforms.  I've never
had occasion to use it, but that would seem to be an attractive option
for a cross-platform GUI.



No, not really.
It should really be called the Standard Windows Toolkit so that it
reflects the fact that it's far from industrial quality on any other
platform (except possibly Mac). Unless things have changed, it should
be easy to see what I mean if you create a simple benchmark of GUI
operations speed.
Furthermore, I'm not sure SWT exist for platforms other than Win32,
Mac and Linux.


t.n.a.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Only MONO/WinForms is a way to go

2006-11-29 Thread Tomi N/A

2006/11/29, Joshua D. Drake <[EMAIL PROTECTED]>:

On Wed, 2006-11-29 at 17:31 +, Tomi N/A wrote:
> 2006/11/29, Joshua D. Drake <[EMAIL PROTECTED]>:
> > On Wed, 2006-11-29 at 10:17 -0600, Tony Caduto wrote:
> > > Joshua D. Drake wrote:
> > > > The closest I think you would find is Eclipse or maybe KDevelop.
> > > >
> > > Actually NetBeans 5.5 is the closest thing I have seen for Java that is
> > > even close to visual studio or Delphi.
> >
> > I forgot that NetBeans was open source now... but Eclipse gives you
> > multi language support.
>
> You say that as if NetBeans doesn't provide instant internationalization. :)

I meant multi programming language support.


There's at least a C/C++ pack available, but yes, eclipse has a
reacher set of plugins to support other languages.

t.n.a.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Only MONO/WinForms is a way to go

2006-11-29 Thread Tomi N/A

2006/11/29, Joshua D. Drake <[EMAIL PROTECTED]>:

On Wed, 2006-11-29 at 10:17 -0600, Tony Caduto wrote:
> Joshua D. Drake wrote:
> > The closest I think you would find is Eclipse or maybe KDevelop.
> >
> Actually NetBeans 5.5 is the closest thing I have seen for Java that is
> even close to visual studio or Delphi.

I forgot that NetBeans was open source now... but Eclipse gives you
multi language support.


You say that as if NetBeans doesn't provide instant internationalization. :)
Seriously, NetBeans' sore point with respect to Eclipse is a
characteristic hard to demonstrate using flash tutorials, but one
nevertheless very important: working with java source code.
Supposedly, it's going to be a lot better with NB 6.0, but I'll see it
when I believe it.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Only MONO/WinForms is a way to go

2006-11-28 Thread Tomi N/A

2006/11/28, Andrus <[EMAIL PROTECTED]>:


Only MONO/WinForms  is a way to go in any serious application.


Mono needs to show a lot more than beagle and f-spot to be even
considered interesting, let alone a platform to base industrial
strength applications on.
As long as that doesn't radically change for the better (possibly
never, but then again maybe as soon as a year or two), .net will still
be a golden cage, and "multiplatform .net" an oxymoron. That's if you
don't mind all the ethical mumbo-jumbo (easier to dismiss it when you
call it a name like that) of dealing with a company like MS and
securing plentiful licence sales for them through your product.

Cheers,
t.n.a.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq