from:"David Rysdam"

Re: [GENERAL] I want the stupidest possible binary export

2014-09-18 Thread David Rysdam

Jov  writes:
> psql can only input／output text string，which can not be binary content。with
> 9.2，you can encode bytea to base64，save to file，then use shell command to
> decode the file。

This worked, btw. Encoded to base64, piped to sed to fix the newlines,
piped to 'base64 -id' and then to file.


smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] I want the stupidest possible binary export

2014-09-18 Thread David Rysdam

Jov  writes:
> psql can only input／output text string，which can not be binary content。with
> 9.2，you can encode bytea to base64，save to file，then use shell command to
> decode the file。
> google “amutu.com pg bytea” can get a blog post。

I wondered if I could do that. OK, will try it, thanks.


smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] I want the stupidest possible binary export

2014-09-18 Thread David Rysdam

Adrian Klaver  writes:
>>  psql -t -c "\copy (select mybinaryfield from mytable where key = 1) to
>>  'file'" with format binary
>
>  From here:
>
> http://www.postgresql.org/docs/9.2/static/app-psql.html
>
> the above should be:
>
> psql -t -c "\copy (select mybinaryfield from mytable where key = 1) to
>  'file'" with binary

OK, good, that works. However, now I have the problem I suspected I
would: I get more than just the bytes that were in my field. 27 bytes
more, to be exact.

I already did -t for tuples-only and I'm only doing one row and column
so there's no separators there. Since it's a .gz file and 'file' no
longer recognizes it, I deduce that the magic cookie isn't at the
beginning of the file anymore and some of the 27 bytes must be at the
beginning of the file instead. 

Is this a PG-specific binary format with a wrapper around my data or
just my data only?

smime.p7s
Description: S/MIME cryptographic signature

[GENERAL] I want the stupidest possible binary export

2014-09-18 Thread David Rysdam

I've got a some tables with bytea fields that I want to export only the
binary data to files. (Each field has a gzipped data file.)

I really want to avoid adding overhead to my project by writing a
special program to do this, so I'm trying to do it from psql. Omitting
the obvious switches for username, etc, here's what I'm doing:

psql -t -c "\copy (select mybinaryfield from mytable where key = 1) to
'file'"

That works, but I get escaped bytes. I want actual binary directly out
of the DB. Another option might be:

psql -t -c "\copy (select mybinaryfield from mytable where key = 1) to
'file'" with format binary

However, there are two problems. First, I get an syntax error "at or
near 'format'". (Running 9.2 client and server.) And second, I suspect
that'll be some "proprietary" PG format, not the actual bytes from just
my field.

What option am I missing?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-18 Thread David Rysdam

Steve Spence  writes:
> It's actually quite freeing, not complicating. I can put the values
> right into the fields I need them to be in (or get values from the
> database I need to control the Arduino), without going through a
> intermediate process. If you have a serial process I can look at that
> works with 1000 or more remote sensors all over the world to a hosted
> database server, I'd love to look at it. Right now what I have works,
> but I have no GIS functionality.

I'm not going to claim this is a great design, but it's at least an
order of magnitude easier than your proposal:

Set up the simplest web server that will run PHP. Program the Arduinos
to submit POST requests to it (which is just networking you say you can
already handle), by which they submit their SQL strings. Write a PHP
script that passes those to PostgreSQL.

The webserver should be able to handle at least as much as what you were
going to throw at the DB server, so there's no load problem. 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-18 Thread David Rysdam

Steve Spence  writes:
> The Arduino is very good at compiling includes written in C/C++. just
> need a .h and .ccp file with the correct syntax, but very compact. It
> used to be part of the fun making programs fit in 4-16k back in the
> day.

And what happened when you tried the suggestions you got to use the
existing code?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-18 Thread David Rysdam

Steve Spence  writes:
> Need a Team lead on this, and I'll collaborate as much as I can on the
> Arduino / Networking side.

I don't understand why there is so much blowback to the idea that you
can just use the existing code. Why exactly wouldn't it work? It
compiles on that architecture already. The only possible issue is size.

Take the libpq source and cross-compile it for the Arduino. I did this
back in 2008 or so for a stepper motor driver that didn't work how I
wanted. 

I wouldn't be surprised if the lame Arduino *IDE* won't do this, but
what does that matter?  You can make the library by cross-compiling,
then include it in your Arduino sketch.

That said, I agree with the other poster who questioned why you'd put
the DB client on the device anyway. Seems needlessly difficult and
limiting. Just put a simple serial writer on that end and have a tiny
serial->DB translator on the same machine you were going to put the DB
on. Done in about an hour and you can switch DBMSs without having to
touch the device.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-17 Thread David Rysdam

Steve Spence  writes:
> no, you can't run arm / debian on an arduino UNO. it's all c++
> compiled to machine code then uploaded.

This is how all executables work. 


smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] Arduino SQL Connector

2014-04-17 Thread David Rysdam

Bruce Momjian  writes:
> On Thu, Apr 17, 2014 at 10:44:36AM -0400, David Rysdam wrote:
>> Maybe I'm being naive, but isn't libpq already being compiled for ARM by
>> Debian? As long as it fits, you should be good. If it doesn't, you'll
>> need to strip some stuff out.
>
> Oh, can you run Debian ARM code on Arduino?  If so, Postgres's libpq
> could be used directly, though it is probably too big, as you mentioned.
> The MySQL driver is C++, which surprised me.

Well, I don't know about just picking it up and moving it over. But back
in the day I wrote a C library which I cross-compiled for ARM on my
Intel Linux box and used with my Arduino. At least, I think that's what
I did. This was 6 years ago.

The fact that it (presumably) compiles on Debian under ARM should mean
it would Just Cross-Compile for Arduino. Whether it would fit is another
question.

smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] Arduino SQL Connector

2014-04-17 Thread David Rysdam

Bruce Momjian  writes:
> On Thu, Apr 17, 2014 at 09:39:55AM -0400, Steve Spence wrote:
>> So, who wants to work on this with me? I'm a fair arduino programmer,
>> but know nothing about postgres.
>
> I would look at the MySQL one as a first step to see how that was done.
> You are basically going to need to duplicate libpq, which is a major
> undertaking.  

Maybe I'm being naive, but isn't libpq already being compiled for ARM by
Debian? As long as it fits, you should be good. If it doesn't, you'll
need to strip some stuff out.


smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam

On Wed, 27 Nov 2013 13:04:54 -0500, David Rysdam  wrote:
> We deliberately try to keep our queries fairly simple for several
> reasons. This isn't the most complicated, but they don't get much more
> than this. I'll have them start with 10MB and see what they get.

10MB was enough to get that query to come back instantly. The same query
on some larger tables were still slow so we ended up bumping up to 50MB
to get the entire job done. That probably sounds like a lot to you guys,
but now that we know the behavior and what kind of queries we have I
think we're OK.

(And if you want a shocker, when I looked at our own DB, we've had
work_mem set to 250MB on a lot of our servers and 1GB on our main
production machine. Heh. I've got some PG tuning books here next to me
now...)

Thanks!

smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam

On Wed, 27 Nov 2013 13:02:20 -0500, Tom Lane  wrote:
> David Rysdam  writes:
> >effective_cache_size - 12000MB
> >shared_buffers - 1024MB
> >random_page_cost - is commented out
> >cpu_tuple_cost -  commented out
> >work_mem - commented out
> 
> > I assume you guys already know the default values for those last 3 on a
> > 9.0.x server...
> 
> Default work_mem is only 1MB, so that probably explains why you're not
> getting a hashed subplan here.  Have them knock it up some, say on the
> order of 10MB.  (If none of your queries are any more complicated than
> this one, you could go higher.  But keep in mind that a backend can use
> work_mem per sort/hash/materialize step, not per query --- so complex
> queries can use many times work_mem.  Multiply that by the number of
> backends, and you can end up in swap hell pretty quickly with an over
> optimistic value.)

We deliberately try to keep our queries fairly simple for several
reasons. This isn't the most complicated, but they don't get much more
than this. I'll have them start with 10MB and see what they get.


smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam

On Wed, 27 Nov 2013 11:06:51 -0500, bricklen  wrote:
> Has the client ANALYZEd recently? What happens if the client issues
> the following commands before executing the query?
> VACUUM ANALYZE lp.sigs;
> VACUUM ANALYZE lp.mags;
>
> If that doesn't change the plan, could you post the values for
> effective_cache_size, shared_buffers, random_page_cost,
> cpu_tuple_cost, work_mem and how much RAM is in the client machine?

Yes, I did have them do a vacuum analyze with no result. Here's their
reply on configuration:

   It is 24 Gig.

   effective_cache_size - 12000MB
   shared_buffers - 1024MB
   random_page_cost - is commented out
   cpu_tuple_cost -  commented out
   work_mem - commented out

I assume you guys already know the default values for those last 3 on a
9.0.x server...


smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam

On Wed, 27 Nov 2013 11:21:09 -0500, Tom Lane  wrote:
> DBMS-agnostic frequently means "sucks equally badly on all DBMSes" :-(

We've generally been OK (cf the ~50ms runtime for the same query at our
site), but we also notice problems sooner than our client sometimes does
and can make algorithm improvements where we don't know how to make DB
ones.

> This query is hard to optimize because of the weird behavior of NOT IN
> when nulls are involved.  Since you aren't complaining that the query
> fails entirely, I'm supposing that lp.Sigs.signum contains no nulls,
> but the planner doesn't know that.  If you can transform it to a NOT
> EXISTS, you'll likely get a much better plan:
> 
> select signum from lp.Mags where signum is not null and
> not exists (select 1 from lp.Sigs where lp.Sigs.signum = lp.Mags.signum)

We've already shipped to the client, but I'm looking at how extensive a
patch would have to be. Very surprising we haven't hit this issue
before.

> What you want is an "anti join" plan, or at least a plan that mentions
> a "hashed subplan".  Plain subplans are death performance-wise, because
> they amount to being nestloop joins rather than anything smarter.  (In
> this case it's likely not choosing a hashed subplan because work_mem is
> too small to allow that.)

I've got the client looking for this and other values already. We'll
soon know...

smime.p7s
Description: S/MIME cryptographic signature

[GENERAL] nested query vs left join: query planner very confused

2013-11-27 Thread David Rysdam

I've got two tables, sigs and mags. It's a one-to-one relationship, mags
is just split out because we store a big, less-often-used field
there. "signum" is the key field.

Sometimes I want to know if I have any orphans in mags, so I do a query
like this:

select signum from lp.Mags where signum is not null and signum not
in (select lp.Sigs.signum from lp.Sigs)

(I do this as a subquery because we originally had a old Sybase DB where
outer joins were a non-standard pain and this way works the same and is
DBMS-agnostic.)

At my location, this query runs very fast (~50ms on a ~100k row table)
and 'explain' shows a plan with this structure:

Seq scan on mags
Filter:
SubPlan 1
Seq scan on sigs

At my client's location, the query is very slow (same table size,
similar hardware/config, although they are running 9.0.x and I'm on
9.2.x). By "slow" I mean like an *hour*. 'explain' has this structure:

Seq scan on mags
Filter:
SubPlan 1
Materialize
Seq scan on sigs

I'd never heard of Materialize before, so I looked into it. Seems to
make a virtual table of the subquery so repetitions of the parent query
don't have to re-do the work. Sounds like it should only help, right?

The client's 'explain analyze' shows this:

   Seq Scan on mags  (cost=0.00..187700750.56  rows=47476  width=4) (actual 
time=3004851.889..3004851.889  rows=0  loops=1)
  Filter:  ((signum IS NOT NULL) AND (NOT (SubPlan 1)))
  SubPlan 1
-> Materialize  (cost=0.00..3713.93  rows=95862  width=4) 
(actual time=0.011..16.145  rows=48139  loops=94951)
   -> Seq Scan on sigs (cost=0.00..2906.62 rows=95862 
width=4) (actual time=0.010..674.201  rows=95862  loops=1)
   Total runtime: 3004852.005 ms

If I'm reading this correctly, the Materialize is running ~95k times,
taking the majority of the time. Why? The only thing I can think of is
this scenario:

1) server thinks it has a LOT of RAM
2) decides to Materialize subquery to take advantage
3) machine does not actually have that RAM, so it gets swapped
4) server notices it was swapped and decides to re-run rather than
unswap
5) goto 2

I don't know if that's a realistic scenario, but it's all I got. I'm
already well into unknown territory, performance-tuning-wise.

I also decided to try doing the query a different way:

 select lp.mags.signum from lp.mags left join lp.sigs on
 lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
 and lp.sigs.signum is null;

This one runs fast for both of us. So I guess my second question is: why
can't the query planner tell these are the same query? 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] having difficulty with explain analyze output

2013-11-27 Thread David Rysdam

On Tue, 26 Nov 2013 14:51:22 -0500, Martijn van Oosterhout  
wrote:
> The Seq Scan took 674ms and was run once (loops=1)
> 
> The Materialise was run 94951 times and took, on average, 0.011ms to
> return the first row and 16ms to complete.
> 
> 16.145 * 94951 = 1532983.895

OK, this is helpful. But why would Materialize run 94k times? I thought
the whole point of Materialize was to make a "virtual table" that could
speed up "parent" queries. If it has to recreate the subquery, why would
the planner choose to Materialize?

smime.p7s
Description: S/MIME cryptographic signature

[GENERAL] help interpreting "explain analyze" output

2013-11-26 Thread David Rysdam

I'm not really looking for information on how to speed this query
up. I'm just trying to interpret the output enough to tell me which step
is slow:

   Seq Scan on mags  (cost=0.00..187700750.56  rows=47476  width=4) (actual 
time=3004851.889..3004851.889  rows=0  loops=1)
  Filter:  ((signum IS NOT NULL) AND (NOT (SubPlan 1)))
  SubPlan 1
-> Materialize  (cost=0.00..3713.93  rows=95862  width=4) 
(actual time=0.011..16.145  rows=48139  loops=94951)
   -> Seq Scan on sigs (cost=0.00..2906.62 rows=95862 
width=4) (actual time=0.010..674.201  rows=95862  loops=1)
   Total runtime: 3004852.005 ms

It looks like the inner seq scan takes 674ms, then the materialize takes
an additional 16ms? Or is that 16ms * 94951? Or 674 * 94951?

And the outer seq scan takes 3004851-3004851 = 0ms?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] having difficulty with explain analyze output

2013-11-26 Thread David Rysdam

I'm not really looking for information on how to speed this query
up. I'm just trying to interpret the output enough to tell me which step
is slow:

   Seq Scan on mags  (cost=0.00..187700750.56  rows=47476  width=4) (actual 
time=3004851.889..3004851.889  rows=0  loops=1)
  Filter:  ((signum IS NOT NULL) AND (NOT (SubPlan 1)))
  SubPlan 1
-> Materialize  (cost=0.00..3713.93  rows=95862  width=4) 
(actual time=0.011..16.145  rows=48139  loops=94951)
   -> Seq Scan on sigs (cost=0.00..2906.62 rows=95862 
width=4) (actual time=0.010..674.201  rows=95862  loops=1)
   Total runtime: 3004852.005 ms

It looks like the inner seq scan takes 674ms, then the materialize takes
an additional 16ms? Or is that 16ms * 94951? Or 674 * 94951?

And the outer seq scan takes 3004851-3004851 = 0ms?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread David Rysdam

On Wed, 2 Oct 2013 11:19:58 -0400, Kevin Grittner  wrote:
> David Rysdam  wrote:
> 
> > We have a by-our-standards large table (about 40e6 rows). Since it is
> > the bottleneck in some places, I thought I'd experiment with
> > partitioning.
> 
> In my personal experience I have gone into hundreds of millions of
> rows with good performance without partitioning.  It's all about
> designing good indexes for the workload.

Well, our performance is still good. Certainly better than a lot of
projects I've seen even with less data. But it's still our "worst" table
and I have some free time to experiment...

> > Because object number figure so prominently, I thought I'd partition on
> > that. To me, it makes the most sense from a load-balancing perspective
> 
> Load balancing?  Hitting a single partition more heavily improves
> your cache hit ratio.  What sort of benefit are you expecting from
> spreading the reads across all the partitions?  *Maybe* that could
> help if you carefully placed each partition table on a separate set
> of spindles, but usually you are better off having one big RAID so
> that every partition is spread across all the spindles
> automatically.

Now that you spell it out, I guess that does make more sense. I had some
vague notion of tables "doing work" but really if it can load one
partition into RAM and get most of my hits from there, it'd be a big
win.

> > Lower numbers are going to be queried much less often than higher
> > numbers.
> 
> This suggests to me that you *might* get a performance boost if you
> define partitions on object number *ranges*.  It still seems a bit
> dubious, but it has a chance.

Would the planner be smart enough to figure out ranges without me having
to "hint" my queries? 

In any case, my speed tests are coming out the opposite what I
expected. Within-partition queries are taking longer than the whole
table did while across-partition queries are faster. I'll have to do
more thinking on that.

smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread David Rysdam

On Wed, 2 Oct 2013 09:12:02 -0400, Bill Moran  wrote:
> Last I looked, the partitioning mechanism isn't _quite_ as smart as could
> be desired.  For example:
> SELECT * FROM table WHERE objnum = 5; -- will not take advantage of partition
> You have to give the planner a little more hint as to the fact that it can
> take advantage of the partition:
> SELECT * FROM table WHERE (objnum % 2) = 1 AND objnum = 5;
> As silly as it seems, this is enough information for the planner to know
> that it only needs to scan one partition.

This seemed ridiculously silly until I thought about it. I guess it has
no way of "unwrapping" my constraint and figuring out what to do. Would
this also apply if I did ranges or is that a common enough constraint
that it *can* figure it out without me having to modify all my queries?

smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread David Rysdam

On Wed, 2 Oct 2013 08:34:44 -0400, David Rysdam  wrote:
> However, when I run an explain or an explain analyze, I still seeing it
> checking both partitions. Is this because the query planner doesn't want
> to do a mod? Should I go with simple ranges, even though this adds a
> maintenance task? 

I guess I should give some administrivia as well: Server is 9.2.1
running Linux. The configuration is otherwise pretty vanilla with only
minor, and poorly-understood, conf changes.

smime.p7s
Description: S/MIME cryptographic signature

[GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread David Rysdam

We have a by-our-standards large table (about 40e6 rows). Since it is
the bottleneck in some places, I thought I'd experiment with
partitioning. I'm following the instructions here:

http://www.postgresql.org/docs/current/static/ddl-partitioning.html

The table holds data about certain objects, each of which has an object
number and some number of historical entries (like account activity at a
bank, say). The typical usage pattern is: relatively rare inserts that
happen in the background via an automated process (meaning I don't care
if they take a little longer) and frequent querying, including some
where a human is sitting in front of it (i.e. I'd like it to be a lot
faster).

Our most frequent queries either select "all history for object N" or
"most recent item for some subset of objects".

Because object number figure so prominently, I thought I'd partition on
that. To me, it makes the most sense from a load-balancing perspective
to partition on the mod of the object number (for this test, evens vs
odds, but planning to go up to mod 10 or even mod 100). Lower numbers
are going to be queried much less often than higher numbers. This scheme
also means I never have to add partitions in the future.

I set up my check constraints ((objnum % 2) = 0 and (objnum % 2) = 1 on
the relevant tables) and turned constraint_exclusion to 'partition' in
postgresql.conf. I also turned it to 'on' in my psql interface.

However, when I run an explain or an explain analyze, I still seeing it
checking both partitions. Is this because the query planner doesn't want
to do a mod? Should I go with simple ranges, even though this adds a
maintenance task? 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] prepared statement results don't clear?

2006-01-18 Thread David Rysdam


Michael Fuhr wrote:


On Tue, Jan 17, 2006 at 03:37:14PM -0500, David Rysdam wrote:
 


I have a Tcl function that does this:

1) create prepared statement for binary insertion via pg_exec (and 
releases the result handle)
2) run statement with binary data via pg_exec_prepared (and releases the 
result handle)

3) deallocate statement via pg_exec (and releases the result handle)

When I try to run this function a couple hundred times, I get "had limit 
on result handles reached" after 128 successes.  It seems something is 
not being released.
   



Can you post a simple test case?  I can provoke "hard limit on
result handles reached" by not clearing results, but so far I haven't
seen that error if I do clear all results.
 

Sorry, I was wrong.  I had a connection leak that was hidden because my 
accounting procedure and the actual code made different assumptions.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[GENERAL] prepared statement results don't clear?

2006-01-17 Thread David Rysdam


I have a Tcl function that does this:

1) create prepared statement for binary insertion via pg_exec (and 
releases the result handle)
2) run statement with binary data via pg_exec_prepared (and releases the 
result handle)

3) deallocate statement via pg_exec (and releases the result handle)

When I try to run this function a couple hundred times, I get "had limit 
on result handles reached" after 128 successes.  It seems something is 
not being released.  To make absolutely sure it's not me that's leaving 
something out there, I output a line each time I either create or 
destroy a result handle and they add up perfectly.  Furthermore, all the 
pg_execs go through another function that has been well-exercised, so I 
don't think the problem is there.


The only thing I can think of is that a prepared statement (or the Tcl 
specific pg_exec_prepared) has, like, a "double" result handle (one for 
the statement itself and one for the exec'ing thereof).  Kind of a 
half-assed theory, but necessity is the mother of such invention.  Does 
anyone else have any better ideas for locating the result handle leak?



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] EXPLAIN ANALYZE output

2005-12-27 Thread David Rysdam

I'm afraid I still don't understand the output of EXPLAIN ANALYZE.  The 
output I get is like this:


blah blah blah (cost=A..B blah blah) (actual time=C..D blah blah)

According to what I've been able to find:

A = predicted time to first returned row in milliseconds
B = total cost in arbitrary units
C = actual time to first returned row in milliseconds
D = actual total time in milliseconds

1) Is this correct?

I had a query that was taking a long time and I spent the last day and a 
half trying to reduce D.  I probably should have been working on 
reducing C, but that's neither here nor there.  Nothing I've done has 
had any appreciable effect, with both C and D staying around 
17-20 (~= 3 minutes), including dropping the sort condition.  
After all this head-on-wall banging, I went back and tried doing the 
query without EXPLAIN ANALYZE.  It takes under a minute, even with the 
sort turned on.


2) Does EXPLAIN ANALYZE add a lot of overhead that is being measured here? 

3) Even if so, why has the output of EXPLAIN ANALYZE not dropped even 
though the query itself is now faster (due to something I did while 
working on C/D?)?


There's been plenty of vacuuming and analyzing on these tables, so 
outdated stats are not the answer.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [GENERAL] reading EXPLAIN output

2005-12-22 Thread David Rysdam


Tom Lane wrote:


David Rysdam <[EMAIL PROTECTED]> writes:
 


merge join (cost=0.00..348650.65 rows=901849 width=12)
 merge cond {blah}
 join filter {blah}
index scan using {blah index on blah} (cost=0.00..289740.65 
rows=11259514 width=8)
index scan using {blah index on blah} (cost=0.00..17229.93 
rows=902085 width=8)
   



 

This query takes about 3 minutes to run and I'm trying to figure out 
why.  From a tutorial and the docs, I gather that the "..largenum" part 
is the number of page reads required, so I understand where 289740 and 
17229 come from.  But what about 348650 page reads for the "merge 
join"?
   



You're misreading it.  An upper node's cost includes the cost of its
children.  So the actual cost estimate for the join step is 41680.07.

 


When I do EXPLAIN ANALYZE, the actual values come out like this:
   



 


merge join: (actual time=170029.404..170029.404)
   



That seems a bit odd ... is there only one row produced?  Could you show
us the entire EXPLAIN ANALYZE output, rather than your assumptions about
what's important?

Increasing work_mem won't help a merge join, but if you can get it large
enough to allow a hash join to be used instead, that might be a win.

regards, tom lane


 


I'm looking for certain anomalies, so the end result should be zero rows.

merge join (cost=0.00..348650.65 rows=901849 width=12)  (actual 
time=170029.404..170029.404 rows=0 loops=1)

merge cond {blah}
join filter {blah}
 index scan using {blah index on blah} (cost=0.00..289740.65 
rows=11259514 width=8) (actual time=29.227..85932.426 rows=11256725 loops=1)
 index scan using {blah index on blah} (cost=0.00..17229.93 rows=902085 
width=8) (actual time=39.896..6766.755 rows=902236 loops=1)

Total runtime: 172469.209 ms




---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] reading EXPLAIN output

2005-12-22 Thread David Rysdam


merge join (cost=0.00..348650.65 rows=901849 width=12)
 merge cond {blah}
 join filter {blah}
index scan using {blah index on blah} (cost=0.00..289740.65 
rows=11259514 width=8)
index scan using {blah index on blah} (cost=0.00..17229.93 
rows=902085 width=8)


This query takes about 3 minutes to run and I'm trying to figure out 
why.  From a tutorial and the docs, I gather that the "..largenum" part 
is the number of page reads required, so I understand where 289740 and 
17229 come from.  But what about 348650 page reads for the "merge 
join"?  My conjecture is that the joined keys are being stored on disk 
(if that's how the internals of postgresql works) and have to be re-read 
for the rest of the query.  Is that right?  Does that mean I could speed 
this up by giving more RAM to store it in?


When I do EXPLAIN ANALYZE, the actual values come out like this:

merge join: (actual time=170029.404..170029.404)
index scan: (actual time=27.653..84373.805)
index scan: (actual time=45.681..7026.928)

This seems to confirm that it's the final "merge join" that takes 
forever.  Because it is writing to and reading from disk?


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] "incomplete startup packet" on SGI

2005-12-15 Thread David Rysdam


David Rysdam wrote:


David Rysdam wrote:


Tom Lane wrote:


David Rysdam <[EMAIL PROTECTED]> writes:
 

Just finished building and installing on *Sun* (also 
"--without-readline", not that I think that could be the issue): 
Works fine.  So it's something to do with the SGI build in particular.
  




More likely it's something to do with weird behavior of the SGI 
kernel's

TCP stack.  I did a little googling for "transport endpoint is not
connected" without turning up anything obviously related, but that or
ENOTCONN is probably what you need to search on.

regards, tom lane

---(end of 
broadcast)---

TIP 2: Don't 'kill -9' the postmaster


 

It's acting like a race condition or pointer problem.  When I add 
random debug printfs/PQflushs to libpq it sometimes works.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Not a race condition: No threads
Not a memory leak: Electric fence says nothing.  And it works when 
electric fence is running, whereas a binary that uses the same libpq 
without linking efence does not work.


I know nobody is interested in this, but I think I should document the 
"solution" for anyone who finds this thread in the archives:  My theory 
is that Irix is unable to keep up with how fast the postgresql client is 
going and that the debug statements/efence stuff are slowing it down 
enough that Irix can catch up and make sure the socket really is there, 
connected and working.  To that end, I inserted a sleep(1) in 
fe-connect.c just before the pqPacketSend(...startpacket...) stuff.  
It's stupid and hacky, but gets me where I need to be and maybe this 
hint will inspire somebody who knows (and cares) about Irix to find a 
real fix.









---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] "incomplete startup packet" on SGI

2005-12-14 Thread David Rysdam


David Rysdam wrote:


Tom Lane wrote:


David Rysdam <[EMAIL PROTECTED]> writes:
 

Just finished building and installing on *Sun* (also 
"--without-readline", not that I think that could be the issue): 
Works fine.  So it's something to do with the SGI build in particular.
  



More likely it's something to do with weird behavior of the SGI kernel's
TCP stack.  I did a little googling for "transport endpoint is not
connected" without turning up anything obviously related, but that or
ENOTCONN is probably what you need to search on.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


 

It's acting like a race condition or pointer problem.  When I add 
random debug printfs/PQflushs to libpq it sometimes works.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Not a race condition: No threads
Not a memory leak: Electric fence says nothing.  And it works when 
electric fence is running, whereas a binary that uses the same libpq 
without linking efence does not work.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] "incomplete startup packet" on SGI

2005-12-14 Thread David Rysdam


Tom Lane wrote:


David Rysdam <[EMAIL PROTECTED]> writes:
 

Just finished building and installing on *Sun* (also 
"--without-readline", not that I think that could be the issue): Works 
fine.  So it's something to do with the SGI build in particular.
   



More likely it's something to do with weird behavior of the SGI kernel's
TCP stack.  I did a little googling for "transport endpoint is not
connected" without turning up anything obviously related, but that or
ENOTCONN is probably what you need to search on.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


 

It's acting like a race condition or pointer problem.  When I add random 
debug printfs/PQflushs to libpq it sometimes works. 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] "incomplete startup packet" on SGI

2005-12-13 Thread David Rysdam

Just finished building and installing on *Sun* (also 
"--without-readline", not that I think that could be the issue): Works 
fine.  So it's something to do with the SGI build in particular.


David Rysdam wrote:

I have a working 8.1 server running on Linux and I can connect to it 
from other Linux clients.  I built postgresql 8.1 on an SGI (using 
--without-readline but otherwise stock) and it compiled OK and 
installed fine.  But when I try to connect to the Linux server I get 
"could not send startup packet: transport endpoint is not connected" 
on the client end and "incomplete startup packet" on the server end.  
Connectivity between the two machines is working.


I could find basically no useful references to the former and the only 
references to the latter were portscans and the like.


Browsing the source, I see a couple places that message could come 
from.  One relates to SSL, which the output from configure says is 
turned off on both client and server.  The other is just a generic 
comm error--but would could cause a partial failure like that?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[GENERAL] "incomplete startup packet" on SGI

2005-12-13 Thread David Rysdam

I have a working 8.1 server running on Linux and I can connect to it 
from other Linux clients.  I built postgresql 8.1 on an SGI (using 
--without-readline but otherwise stock) and it compiled OK and installed 
fine.  But when I try to connect to the Linux server I get "could not 
send startup packet: transport endpoint is not connected" on the client 
end and "incomplete startup packet" on the server end.  Connectivity 
between the two machines is working.


I could find basically no useful references to the former and the only 
references to the latter were portscans and the like.


Browsing the source, I see a couple places that message could come 
from.  One relates to SSL, which the output from configure says is 
turned off on both client and server.  The other is just a generic comm 
error--but would could cause a partial failure like that?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] missing something obvious about intervals?

2005-12-12 Thread David Rysdam

I knew it had to be something like this, but the search space was just 
too big.  Thanks!


Jim Buttafuoco wrote:


try


select '2005-12-01'::date + (456.5::float || ' seconds')::interval;
   ?column?


2005-12-01 00:07:36.50
(1 row)


-- Original Message -------
From: David Rysdam <[EMAIL PROTECTED]>
To: "pg >> Postgres General" 
Sent: Mon, 12 Dec 2005 13:00:13 -0500
Subject: [GENERAL] missing something obvious about intervals?

 

I have a table that has a date field and a "real" field that represents 
a number of seconds.  I want select the date field + the seconds field.  
Just adding it doesn't work.  Casting to interval doesn't work.  
to_date/to_timestamp don't work.  How do I do this?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq
   


--- End of Original Message ---


 




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] missing something obvious about intervals?

2005-12-12 Thread David Rysdam

I have a table that has a date field and a "real" field that represents 
a number of seconds.  I want select the date field + the seconds field.  
Just adding it doesn't work.  Casting to interval doesn't work.  
to_date/to_timestamp don't work.  How do I do this?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] date format

2005-12-08 Thread David Rysdam


David Rysdam wrote:


David Rysdam wrote:

I could swear (but I don't think I can prove at this point) that 8.0 
beta3 returned timestamps with milliseconds, like this:


-MM-DD HH24:MI:SS.MS

But 8.1 isn't doing that.  I see functions to format the date, but 
that would require me to change all my existing SQL to specifically 
ask for the milliseconds.  There's also the "set datestyle" thing, 
but the granularity doesn't exist to specify an exact format such as 
the above.  Is there a way to tell postgres *exactly* what I want the 
*default* date output format to be?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Nevermind, the documentation just seems to be unclear.  I do get .MS 
back in ISO format (despite that not being shown as part of the ISO 
format).  It's just that none of my dates have milliseconds on them, 
which is simply a bug I'll have to find.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


OK, sorry to flood on this, but there is still a problem.  If I have no 
milliseconds, I'd like to get back ".000" at the end of the output.  
Since MS are indeed part of the ISO format, why don't get I get zeros?  
Alternatively, if MS aren't part of the ISO format, why do I get non-zeros?


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] date format

2005-12-08 Thread David Rysdam


David Rysdam wrote:

I could swear (but I don't think I can prove at this point) that 8.0 
beta3 returned timestamps with milliseconds, like this:


-MM-DD HH24:MI:SS.MS

But 8.1 isn't doing that.  I see functions to format the date, but 
that would require me to change all my existing SQL to specifically 
ask for the milliseconds.  There's also the "set datestyle" thing, but 
the granularity doesn't exist to specify an exact format such as the 
above.  Is there a way to tell postgres *exactly* what I want the 
*default* date output format to be?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Nevermind, the documentation just seems to be unclear.  I do get .MS 
back in ISO format (despite that not being shown as part of the ISO 
format).  It's just that none of my dates have milliseconds on them, 
which is simply a bug I'll have to find.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[GENERAL] date format

2005-12-08 Thread David Rysdam

I could swear (but I don't think I can prove at this point) that 8.0 
beta3 returned timestamps with milliseconds, like this:


-MM-DD HH24:MI:SS.MS

But 8.1 isn't doing that.  I see functions to format the date, but that 
would require me to change all my existing SQL to specifically ask for 
the milliseconds.  There's also the "set datestyle" thing, but the 
granularity doesn't exist to specify an exact format such as the above.  
Is there a way to tell postgres *exactly* what I want the *default* date 
output format to be?


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[GENERAL] to Jerry LeVan

2005-11-18 Thread David Rysdam

I got your email about pgbrowse and it has guided me well to using 
cursors from pgtcl.  Thanks!  I might have a question or two--do you 
have an email address that DNS can resolve?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Very slow queries on 8.1

2005-11-17 Thread David Rysdam


Michael Fuhr wrote:


On Thu, Nov 17, 2005 at 10:38:50AM -0500, David Rysdam wrote:
 

I'm porting an application from Sybase and I've noticed that similar 
application functions take 2 to 3 times longer on postgres than they 
used to on the same machine running under Sybase.  I've tried changing 
various "performance tuning" parameters, such as shared_buffers, 
effective_cache_size, etc but there's little or no effect.
   



What's your OS and hardware environment and what are your non-default
PostgreSQL settings?

 

Fedora Core 2, dual 2.8 GHz, 2 GB ram. 


shared_buffers = 1
effective_cache_size = 10

Right now, I'm working on a test case that involves a table with ~360k 
rows called "nb.sigs".  My sample query is:


select * from nb.sigs where signum > 25

With no index, explain says this query costs 11341.  After CREATE INDEX 
on the signum field, along with an ANALYZE for nb.sigs, the query costs 
3456 and takes around 4 seconds to return the first row.
   



Please post the EXPLAIN ANALYZE output for the query -- that'll
show us the query plan, the accuracy of the planner's row count
estimate, and how long the query takes to execute on the server.
It might also be useful to see the table definition and the output
of the following query:

SELECT null_frac, n_distinct, correlation
FROM pg_stats
WHERE schemaname = 'nb' AND tablename = 'sigs' AND attname = 'signum';

 

The first things would be problematic to supply, since they are actually 
on a computer that doesn't have access to the Internet or to the machine 
I'm writing this on.  As for the query:


Row  null_frac  n_distinct  correlation
10   -1   1


What client interface are you using?  If the query returns a lot
of rows then you might benefit from using a cursor to fetch the
result set in chunks; otherwise the client library is probably
waiting for the entire result set to arrive before returning any
rows to you.  

That does seem to be the problem.  I've never worked with cursors, so 
I'll have to see if I can fit our DB module into that mold.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Very slow queries on 8.1

2005-11-17 Thread David Rysdam


Martijn van Oosterhout wrote:


On Thu, Nov 17, 2005 at 11:31:27AM -0500, David Rysdam wrote:
 

Right, it's about 100k rows and it is through libpq (pgadmin in this 
case, but my app uses libpq from pgtcl).  Is there a way to tell libpq 
to not do what it "likes" and do what I need instead?  I didn't see 
anything in the docs, but I didn't look very hard.
   



Use the async interface. There you submit the query and retrieve rows
as they come in. It's a bit trickier to program but it can be done.

psql doesn't do this though, it's not clear how it could anyway, given
the way it formats.

Have a nice day,
 

I'm experimenting with the async interface right now.  Hopefully it will 
fit in well.


It's OK if psql/pgadmin don't do it, as long as the app does.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Very slow queries on 8.1

2005-11-17 Thread David Rysdam


Bruno Wolff III wrote:


On Thu, Nov 17, 2005 at 11:31:27 -0500,
 David Rysdam <[EMAIL PROTECTED]> wrote:
 

Right, it's about 100k rows and it is through libpq (pgadmin in this 
case, but my app uses libpq from pgtcl).  Is there a way to tell libpq 
to not do what it "likes" and do what I need instead?  I didn't see 
anything in the docs, but I didn't look very hard.
   



You could use a cursor. That will bias the plan toward fast start plans which
might give you lower throughput if you are normally planning to fetch all of
the rows, but will give you quicker access to the first row.


 


That is exactly what I want, but is it possible to use a cursor from pgtcl?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Very slow queries on 8.1

2005-11-17 Thread David Rysdam


Tom Lane wrote:


David Rysdam <[EMAIL PROTECTED]> writes:
 

Right now, I'm working on a test case that involves a table with ~360k 
rows called "nb.sigs".  My sample query is:

select * from nb.sigs where signum > 25
With no index, explain says this query costs 11341.  After CREATE INDEX 
on the signum field, along with an ANALYZE for nb.sigs, the query costs 
3456 and takes around 4 seconds to return the first row.  This seems 
extremely slow to me, but I can't figure out what I might be doing 
wrong.  Any ideas?
   



How many rows does that actually return, and what client interface are
you fetching it with?  libpq, at least, likes to fetch the entire query
result before it gives it to you --- so you're talking about 4 sec to
get all the rows, not only the first one.  That might be reasonable if
you're fetching 100k rows via an indexscan...

regards, tom lane


 

Right, it's about 100k rows and it is through libpq (pgadmin in this 
case, but my app uses libpq from pgtcl).  Is there a way to tell libpq 
to not do what it "likes" and do what I need instead?  I didn't see 
anything in the docs, but I didn't look very hard.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Very slow queries on 8.1

2005-11-17 Thread David Rysdam

int4, not null and the index is unique.  I even tried clustering on it 
to no avail.


codeWarrior wrote:


What is the data type for "signum" ???



"David Rysdam" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
 

I'm porting an application from Sybase and I've noticed that similar 
application functions take 2 to 3 times longer on postgres than they used 
to on the same machine running under Sybase.  I've tried changing various 
"performance tuning" parameters, such as shared_buffers, 
effective_cache_size, etc but there's little or no effect.  I'm beginning 
to think there's a deeper root cause to the slowness.


Right now, I'm working on a test case that involves a table with ~360k 
rows called "nb.sigs".  My sample query is:


select * from nb.sigs where signum > 25

With no index, explain says this query costs 11341.  After CREATE INDEX on 
the signum field, along with an ANALYZE for nb.sigs, the query costs 3456 
and takes around 4 seconds to return the first row.  This seems extremely 
slow to me, but I can't figure out what I might be doing wrong.  Any 
ideas?


(If necessary, I can write an entire script that creates and populates a 
table and then give my performance on that sample for someone else to 
check against.)


---(end of broadcast)---
TIP 6: explain analyze is your friend

   





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[GENERAL] Very slow queries on 8.1

2005-11-17 Thread David Rysdam

I'm porting an application from Sybase and I've noticed that similar 
application functions take 2 to 3 times longer on postgres than they 
used to on the same machine running under Sybase.  I've tried changing 
various "performance tuning" parameters, such as shared_buffers, 
effective_cache_size, etc but there's little or no effect.  I'm 
beginning to think there's a deeper root cause to the slowness.


Right now, I'm working on a test case that involves a table with ~360k 
rows called "nb.sigs".  My sample query is:


select * from nb.sigs where signum > 25

With no index, explain says this query costs 11341.  After CREATE INDEX 
on the signum field, along with an ANALYZE for nb.sigs, the query costs 
3456 and takes around 4 seconds to return the first row.  This seems 
extremely slow to me, but I can't figure out what I might be doing 
wrong.  Any ideas?


(If necessary, I can write an entire script that creates and populates a 
table and then give my performance on that sample for someone else to 
check against.)


---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] I must not understand the permissions system

2005-11-10 Thread David Rysdam


From both User A and User B, this query returns rows:

select * from pg_trigger where tgname = ''

but User A gets 2 rows (because I have the same trigger name on two 
different tables) while User B only sees one of them.  Is it permissions 
on pg_trigger or on the original table (or some other thing) that is 
preventing B from seeing that row?



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[GENERAL] [Fwd: I must not understand the permissions system]

2005-11-10 Thread David Rysdam

Wow, nevermind.  What I must not understand is my own code.  Yikes, what 
a bonehead.
--- Begin Message ---

From both User A and User B, this query returns rows:

select * from pg_trigger where tgname = ''

but User A gets 2 rows (because I have the same trigger name on two 
different tables) while User B only sees one of them.  Is it permissions 
on pg_trigger or on the original table (or some other thing) that is 
preventing B from seeing that row?



--- End Message ---

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] Check for existence of index

2005-04-05 Thread David Rysdam

Tom Lane wrote:
David Rysdam <[EMAIL PROTECTED]> writes:
 

Why can't I have the same index name be on 
different tables?
   

You can ... if they are in different schemas.  Indexes and tables share
the same namespace, ie, they must be unique within a schema.
As for your original question, you probably want something like
SELECT ... FROM pg_class c, pg_namespace n WHERE c.relnamespace = n.oid
 AND relname = 'indexname' AND nspname = 'schemaname' AND relkind = 'i';
If you actually want to verify that this index is on a specific table,
you'll need a more complicated join involving pg_index and a second
scan of pg_class.  See
http://www.postgresql.org/docs/8.0/static/catalogs.html
			regards, tom lane
 

Well, since I can't have more than one index of a given name in a schema 
anyway, I'll have to name them "$tablename_$indexname" or something, 
which means I won't have to verify they are on a particular table. 

Anyway, this query looks good.  I was getting lost in all the 
terminology ("namespace" vs "schema") data distributed all over (some 
stuff in pg_index, some in pg_class, etc).

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [GENERAL] Check for existence of index

2005-04-05 Thread David Rysdam

And another thing, can't I do this:
create table s.a (blah);
create table s.b (blah);
create index myindex on s.a(blah);
create index myindex on s.b(blah);
?  When I drop them I have to specify the schema name, so presumably it 
tracks them that way.  Why can't I have the same index name be on 
different tables?

David Rysdam wrote:
I have a script that automatically creates my database objects.  In 
order to automatically create indexes, it needs to first make sure 
they don't exist.

For things like tables, this is easy:
select * from information_schema.tables where table_schema = 
"" and table_name = ""

But for indexes it is hard for some reason.  There's a catalog table 
"pg_index", but it doesn't have index, schema or table names.  I 
eventually found them in pg_class but the table and schema names 
aren't there.

After some searching around, I came across this very strange (to me, 
anyway) "::regclass" thing that let me do this:

select * from pg_catalog.pg_index where indexrelid = 
'schema.index'::regclass

I'm not really clear what's that doing, but in any case it still isn't 
what I want.  That query returns information when the index exists but 
errors out when the index doesn't exist.  Is there a way I can get a 
non-erroring query on either condition that will tell me if an index 
exists on a given table in a given schema?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 8: explain analyze is your friend

[GENERAL] Check for existence of index

2005-04-05 Thread David Rysdam

I have a script that automatically creates my database objects.  In 
order to automatically create indexes, it needs to first make sure they 
don't exist.

For things like tables, this is easy:
select * from information_schema.tables where table_schema = 
"" and table_name = ""

But for indexes it is hard for some reason.  There's a catalog table 
"pg_index", but it doesn't have index, schema or table names.  I 
eventually found them in pg_class but the table and schema names aren't 
there.

After some searching around, I came across this very strange (to me, 
anyway) "::regclass" thing that let me do this:

select * from pg_catalog.pg_index where indexrelid = 
'schema.index'::regclass

I'm not really clear what's that doing, but in any case it still isn't 
what I want.  That query returns information when the index exists but 
errors out when the index doesn't exist.  Is there a way I can get a 
non-erroring query on either condition that will tell me if an index 
exists on a given table in a given schema?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [GENERAL] psql : how to make it more silent....

2004-10-15 Thread David Rysdam

Tom Lane wrote:
David Rysdam <[EMAIL PROTECTED]> writes:
 

... I would like to have psql (optionally?) not even send me NOTICE 
messages. 
   

Have you looked at client_min_messages?
regards, tom lane
 

I had not, because I'd never heard of it.  :)  Looks like exactly what I 
want and what I was suspecting existed, thanks.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] tcl bindings for 8.0

2004-10-14 Thread David Rysdam

David Rysdam wrote:
The README from 8.0-beta3 says "This distribution also contains 
several language bindings, including C and Tcl" but I'm not finding 
libpgtcl being built, nor can I find a way to tell it to.  I see pgtcl 
is on http://gborg.postgresql.org, so this mean that the README is out 
of date and I need to download a separate piece?  Does it currently 
work with beta3 given that the proejct says "Enhancements will include 
converting all commands to use Tcl 8-style objects"?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Oh, *Tcl* 8 objects.  I was reading that as PostgreSQL 8 at first.  
Still, my question stands.  Is libpgtcl in 8.0-beta3 and I'm missing it 
or do I need to download?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[GENERAL] tcl bindings for 8.0

2004-10-14 Thread David Rysdam

The README from 8.0-beta3 says "This distribution also contains several 
language bindings, including C and Tcl" but I'm not finding libpgtcl 
being built, nor can I find a way to tell it to.  I see pgtcl is on 
http://gborg.postgresql.org, so this mean that the README is out of date 
and I need to download a separate piece?  Does it currently work with 
beta3 given that the proejct says "Enhancements will include converting 
all commands to use Tcl 8-style objects"?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] psql : how to make it more silent....

2004-10-14 Thread David Rysdam

Gaetano Mendola wrote:
Patrick Fiche wrote:
Hi,
 
When I execute a function, I would like psql to show me only RAISE 
NOTICE messages but not all function calls
Indeed, I currently get some messages that I don't care about :
 

* PL/pgSQL function "adm_user" line 321..
* CONTEXT: SQL query "SELECT."
Is there a way to get rid of these messages
 

modify your log_error_verbosity  to "terse"
I can't find anything else on that
http://search.postgresql.org/www.search?ul=http%3A%2F%2Fwww.postgresql.org%2Fdocs%2F7.4%2Fstatic%2F%25&q=log_error_verbosity
and I would like to have psql (optionally?) not even send me NOTICE 
messages. 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [GENERAL] Date format for bulk copy

2004-10-13 Thread David Rysdam

Michael Fuhr wrote:
On Wed, Oct 13, 2004 at 01:32:01PM -0400, David Rysdam wrote:
 

Michael Fuhr wrote:
   

You could filter the data through a script that reformats certain
fields, then feed the reformatted data to PostgreSQL.  This is
usually a trivial task for Perl, awk, sed, or the like.
 

Right, I *can* do this.  But then I have to build knowledge into that 
script so it can find each of these date fields (there's like 20 of them 
across 10 different files) and then update that knowledge each time it 
changes.
   

In your case that's a reasonable argument against filtering the
data with a script.  Using a regular expression in the script might
reduce or eliminate the need for some of the logic, but then you'd
run the risk of reformatting data that shouldn't have been touched.
 

I'm still leaning towards just making postgres accept at ':' 
delimiter for milliseconds.
   

Based on your requirements, that might indeed be a better solution.
I'd probably choose to extend PostgreSQL rather than hack what
already exists, though.  Doing the latter might break something
else and you have to remember to add the hack every time you upgrade
the server software.  That can cause headaches for whoever inherits
the system from you unless it's well-documented.
 

By "extend PostgreSQL" do you mean create a custom input_function for 
timestamp?  Are there docs that give hints for replacing the input 
function of an existing type?  Someone else replied similarly, but I'm 
afraid I'm not familiar enough with PG to decipher it all.

Why not the user-defined type with associated user-defined input function?
   

If filtering the data is awkward, then that might be a better way
to go.
 

I think I will, when I get to that point.
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [GENERAL] Date format for bulk copy

2004-10-13 Thread David Rysdam

Greg Stark wrote:
David Rysdam <[EMAIL PROTECTED]> writes:
 

In my brute force port, I just bulk copied the date
fields into temporary tables and then did a to_timestamp(field, 'Mon DD 
HH:MI:SS:MSAM'). 
   

 

Again, I created a temporary table and did a decode(field, 'hex') to the
real table.
   

This is the standard approach. You're rather lucky these are the only
data representation changes you've had to do so far. I fear you'll run into
more and more complex changes over time and trying to avoid the temporary
table will get harder and harder.
 

No, I think I'm OK there.  These are programmatically-generated values 
and I've already been through them all once.  Just the millisecond issue 
and the hex binary issue AFAIK.

If it were me I would consider processing the files in perl. It should be
pretty easy to do both of these modifications very quickly.
 

Very quick and easy to do one time.  A little trickier to handle in an 
elegant, maintainable way for the dozens of data reloads I do every 
month for GBs of data onto two different server types.

If you really want to go with a custom C code then you might be able to just
grab the byteain/byteaout functions from src/backend/util/adt/varlena into a
separate module and create new functions with modified names. Load it with
CREATE FUNCTION byteain ... AS 'my_bytea_funcs.so' 'my_byteain';
Or maybe create the function as my_byteain in postgres and then update the
catalog entries somehow. I'm not sure how to do that but it shouldn't be too
hard. And it might make it easier to do the substitution for the data load and
then undo the change afterwards.
 

Why not create a type and then define the load function to be the 
equivalent of "decode('hex')"?

Doing the same for timmestamp is a bit trickier but you could copy
ParseDateTime from datetime.c as a static function for your module.
Be careful though, test this out thoroughly on a test database. I'm not sure
of all the impacts of altering the in/out functions for data types. I expect
it would break pg_dump, for example. And I would worry about the statistics
tables too.
 

This is kind of a hybrid of my suggestions and the problems are a hybrid 
as well.  :)

1) Just change the timestamp type so that it allows a ':' delimiter for 
milliseconds.  Potential problems: Other parts of the code won't expect 
it. People don't want that.

2) Create a new type.  Potential problem: Things like date ranges 
probably wouldn't work anymore, since the server wouldn't know it's a 
date now.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] Date format for bulk copy

2004-10-13 Thread David Rysdam

Michael Fuhr wrote:
On Wed, Oct 13, 2004 at 10:06:58AM -0400, David Rysdam wrote:
 

Sybase bulk copies the date fields out in this format:
Mar  4 1973 10:28:00:000AM
Postgresql's COPY (or psql \copy) doesn't like that format.
   

You could filter the data through a script that reformats certain
fields, then feed the reformatted data to PostgreSQL.  This is
usually a trivial task for Perl, awk, sed, or the like.
 

Right, I *can* do this.  But then I have to build knowledge into that 
script so it can find each of these date fields (there's like 20 of them 
across 10 different files) and then update that knowledge each time it 
changes.  I'm still leaning towards just making postgres accept at ':' 
delimiter for milliseconds.  Also, how much would a secondary script 
slow down the bulk copy, if any?

I have a similarish problem with another field type.  In Sybase it's a 
binary format.  In postgres it is a binary format (bytea).  But Sybase 
bcps the data out in ASCII.  Sybase recognizes that when it is a binary 
field and auto-converts the ASCII back to binary.  Postgres doesn't.  
Again, I created a temporary table and did a decode(field, 'hex') to the 
real table.
   

Sounds like Sybase is dumping in hex, whereas PostgreSQL expects
octal.  If you can't change the dump format, then again, filtering
the data through a script might work.
 

Oh, so I can load binary data into PG if it's ASCII-encoded octal?  Why 
not the user-defined type with associated user-defined input function?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[GENERAL] Date format for bulk copy

2004-10-13 Thread David Rysdam

I have a large amount of data that I copy in and out of Sybase very 
often.  Now I also want to copy this data in and out of postgres.  I 
have an existing script that creates the entire database(s) from scratch 
in Sybase and then uses the Sybase bulk copy tool "bcp" to copy the data 
in. 

I already did a brute force port of this script to postgres once, but 
I'm trying to do it more elegantly now that I know what issues I'm going 
to run into.  One of them is date formats in the bcp files.  Sybase bulk 
copies the date fields out in this format:

Mar  4 1973 10:28:00:000AM
Postgresql's COPY (or psql \copy) doesn't like that format.  In 
particular, it doesn't like the millisecond field at the end.  If I 
understand the docs correctly, postgres wants the millisecond field to 
be proceeded by a decimal point instead of a colon.  In my brute force 
port, I just bulk copied the date fields into temporary tables and then 
did a to_timestamp(field, 'Mon DD  HH:MI:SS:MSAM'). 

That worked, but required a lot of additional logic in my script to 
handle the temp tables and conversions.  I'd hate to have to keep all 
that overhead in there to basically handle a conversion of a colon to a 
decimal point. 

So my questions are these:
0) I thought of creating a user-defined data type for this, but it seems 
like overkill, especially if I'd have to provide all kinds of helper 
functions for things like date incrementation or comparison or 
whatever.  Am I off track?
1) Are there any tools out there that allow for specifying the field 
format of a COPY?
2) If not, is it reasonable or unreasonable to modify the postgresql 
source (I'm running Beta 3) to handle a colon as a millisecond 
delimiter?  (If so, where do I look?)
3) If I did create such a patch, would the postgresql accept it into the 
tree?

I have a similarish problem with another field type.  In Sybase it's a 
binary format.  In postgres it is a binary format (bytea).  But Sybase 
bcps the data out in ASCII.  Sybase recognizes that when it is a binary 
field and auto-converts the ASCII back to binary.  Postgres doesn't.  
Again, I created a temporary table and did a decode(field, 'hex') to the 
real table.  It seems reasonable to expect to be able to bulk copy 
ASCII-encoded binary values into binary fields.  Probably this field is 
best described by a user-defined type?

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [GENERAL] Any recommended forums/wiki/blog s/w that uses

2004-08-17 Thread David Rysdam

Devrim GUNDUZ wrote:
Hi,
On Tue, 17 Aug 2004, Shridhar Daithankar wrote:
 

Anyone have any suggestions?
 

drupal? Check out at http://www.drupal.org/
   

Drupal needs some hacking since it uses LIMIT #,# queries in 
database-pear.php. 

Just a FYI.
 

I tried and really liked MoinMoin.  http://moinmoin.wikiwikiweb.de/
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [GENERAL] Simplfied Bytea input/output?

2004-08-05 Thread David Rysdam

Incredibly, I was just sitting down to do something similar for a 
problem I have when I read this email.  I'm going to do a temp table 
too, but I did think of another solution.  It would work for me but it's 
a little complex for my stage of PG expertise: Create a user-defined 
type for "pic" and define an input/output function for it.  Someone can 
correct me but my understanding says that probably won't be as fast as a 
direct bulk copy, but it can't be slower than the temp table method and 
it is certainly simpler.

Jerry LeVan wrote:
Hi,
I have been looking for a fairly simple way to upload data into
a bytea field without having to write custom C code for each table
that contains a bytea field.
With some good advice from Daniel Verite and reading the fine manual
here is my procedure for uploading files to bytea fields.
1) Create an "upload table" that might look like
\d picsTable "public.pics"
 Column |  Type   |Modifiers
+- 
+-
 info   | text|
 image  | bytea   |
 ident  | integer | not null default  
nextval('public.pics_ident_seq'::text)

2) I wrote a single C procedure that would upload to this table
here is a fragment of the code:
int usage()
{
  fprintf(stderr,"loadBytea 
\n");
  fprintf(stderr,"  This will insert the comment (a string) and the  
contents of file\n");
  fprintf(stderr,"  into the first two columns of the specified  
table.\n");
  exit(0);
}

The core of the program is a wrapper around the PQexecParams routine.
The source code for the program is located here:
  http://homepage.mac.levanj/Cocoa/programs/loadBytea.c
3) Once the file is uploaded (say to table pics) I can use the update
command to load the bytea field into the desired table, perhaps
something like:
update person set picture = pics.image
  from pics
  where pics.ident=15 and person.first_name='Marijo'
Once the bytea field has been loaded into the target, it
can be deleted from the upload table ( or a reference could
be placed in the "person" table to the appropriate picture and
all of the pictures keep in the upload table).
This method does not scale well to bulk input but I think it
would not be difficult to rewrite the loadBytea.c program as
needed.
Jerry
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if 
your
 joining column's datatypes do not match



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

[GENERAL] COPY not handling BLOBs

2004-08-04 Thread David Rysdam

I have a bunch of data in Sybase and some of it is in image fields.  We 
use bcp on this data transparently all the time without major issues in 
character mode.   Is there a fundamental technical reason that BLOBs 
can't be COPY'd in postgresql or is it just that nobody has ever wanted 
to before?

(If I was starting from scratch I'd probably be happy with 
pg_dump/pg_restore, but I'd like to migrate in easy stages over from 
Sybase, so I'd like the bcp files to be interchangable.)

Right now I'm having to write a program to create all the large objects 
up front and record the OIDs in a file which I will then COPY in with 
the rest of my data.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

60 matches

Mail list logo