Re: [GENERAL] I want the stupidest possible binary export
Jov writes: > psql can only input/output text string,which can not be binary content。with > 9.2,you can encode bytea to base64,save to file,then use shell command to > decode the file。 This worked, btw. Encoded to base64, piped to sed to fix the newlines, piped to 'base64 -id' and then to file. smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] I want the stupidest possible binary export
Jov writes: > psql can only input/output text string,which can not be binary content。with > 9.2,you can encode bytea to base64,save to file,then use shell command to > decode the file。 > google “amutu.com pg bytea” can get a blog post。 I wondered if I could do that. OK, will try it, thanks. smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] I want the stupidest possible binary export
Adrian Klaver writes: >> psql -t -c "\copy (select mybinaryfield from mytable where key = 1) to >> 'file'" with format binary > > From here: > > http://www.postgresql.org/docs/9.2/static/app-psql.html > > the above should be: > > psql -t -c "\copy (select mybinaryfield from mytable where key = 1) to > 'file'" with binary OK, good, that works. However, now I have the problem I suspected I would: I get more than just the bytes that were in my field. 27 bytes more, to be exact. I already did -t for tuples-only and I'm only doing one row and column so there's no separators there. Since it's a .gz file and 'file' no longer recognizes it, I deduce that the magic cookie isn't at the beginning of the file anymore and some of the 27 bytes must be at the beginning of the file instead. Is this a PG-specific binary format with a wrapper around my data or just my data only? smime.p7s Description: S/MIME cryptographic signature
[GENERAL] I want the stupidest possible binary export
I've got a some tables with bytea fields that I want to export only the binary data to files. (Each field has a gzipped data file.) I really want to avoid adding overhead to my project by writing a special program to do this, so I'm trying to do it from psql. Omitting the obvious switches for username, etc, here's what I'm doing: psql -t -c "\copy (select mybinaryfield from mytable where key = 1) to 'file'" That works, but I get escaped bytes. I want actual binary directly out of the DB. Another option might be: psql -t -c "\copy (select mybinaryfield from mytable where key = 1) to 'file'" with format binary However, there are two problems. First, I get an syntax error "at or near 'format'". (Running 9.2 client and server.) And second, I suspect that'll be some "proprietary" PG format, not the actual bytes from just my field. What option am I missing? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Arduino SQL Connector
Steve Spence writes: > It's actually quite freeing, not complicating. I can put the values > right into the fields I need them to be in (or get values from the > database I need to control the Arduino), without going through a > intermediate process. If you have a serial process I can look at that > works with 1000 or more remote sensors all over the world to a hosted > database server, I'd love to look at it. Right now what I have works, > but I have no GIS functionality. I'm not going to claim this is a great design, but it's at least an order of magnitude easier than your proposal: Set up the simplest web server that will run PHP. Program the Arduinos to submit POST requests to it (which is just networking you say you can already handle), by which they submit their SQL strings. Write a PHP script that passes those to PostgreSQL. The webserver should be able to handle at least as much as what you were going to throw at the DB server, so there's no load problem. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Arduino SQL Connector
Steve Spence writes: > The Arduino is very good at compiling includes written in C/C++. just > need a .h and .ccp file with the correct syntax, but very compact. It > used to be part of the fun making programs fit in 4-16k back in the > day. And what happened when you tried the suggestions you got to use the existing code? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Arduino SQL Connector
Steve Spence writes: > Need a Team lead on this, and I'll collaborate as much as I can on the > Arduino / Networking side. I don't understand why there is so much blowback to the idea that you can just use the existing code. Why exactly wouldn't it work? It compiles on that architecture already. The only possible issue is size. Take the libpq source and cross-compile it for the Arduino. I did this back in 2008 or so for a stepper motor driver that didn't work how I wanted. I wouldn't be surprised if the lame Arduino *IDE* won't do this, but what does that matter? You can make the library by cross-compiling, then include it in your Arduino sketch. That said, I agree with the other poster who questioned why you'd put the DB client on the device anyway. Seems needlessly difficult and limiting. Just put a simple serial writer on that end and have a tiny serial->DB translator on the same machine you were going to put the DB on. Done in about an hour and you can switch DBMSs without having to touch the device. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Arduino SQL Connector
Steve Spence writes: > no, you can't run arm / debian on an arduino UNO. it's all c++ > compiled to machine code then uploaded. This is how all executables work. smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] Arduino SQL Connector
Bruce Momjian writes: > On Thu, Apr 17, 2014 at 10:44:36AM -0400, David Rysdam wrote: >> Maybe I'm being naive, but isn't libpq already being compiled for ARM by >> Debian? As long as it fits, you should be good. If it doesn't, you'll >> need to strip some stuff out. > > Oh, can you run Debian ARM code on Arduino? If so, Postgres's libpq > could be used directly, though it is probably too big, as you mentioned. > The MySQL driver is C++, which surprised me. Well, I don't know about just picking it up and moving it over. But back in the day I wrote a C library which I cross-compiled for ARM on my Intel Linux box and used with my Arduino. At least, I think that's what I did. This was 6 years ago. The fact that it (presumably) compiles on Debian under ARM should mean it would Just Cross-Compile for Arduino. Whether it would fit is another question. smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] Arduino SQL Connector
Bruce Momjian writes: > On Thu, Apr 17, 2014 at 09:39:55AM -0400, Steve Spence wrote: >> So, who wants to work on this with me? I'm a fair arduino programmer, >> but know nothing about postgres. > > I would look at the MySQL one as a first step to see how that was done. > You are basically going to need to duplicate libpq, which is a major > undertaking. Maybe I'm being naive, but isn't libpq already being compiled for ARM by Debian? As long as it fits, you should be good. If it doesn't, you'll need to strip some stuff out. smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] nested query vs left join: query planner very confused
On Wed, 27 Nov 2013 13:04:54 -0500, David Rysdam wrote: > We deliberately try to keep our queries fairly simple for several > reasons. This isn't the most complicated, but they don't get much more > than this. I'll have them start with 10MB and see what they get. 10MB was enough to get that query to come back instantly. The same query on some larger tables were still slow so we ended up bumping up to 50MB to get the entire job done. That probably sounds like a lot to you guys, but now that we know the behavior and what kind of queries we have I think we're OK. (And if you want a shocker, when I looked at our own DB, we've had work_mem set to 250MB on a lot of our servers and 1GB on our main production machine. Heh. I've got some PG tuning books here next to me now...) Thanks! smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] nested query vs left join: query planner very confused
On Wed, 27 Nov 2013 13:02:20 -0500, Tom Lane wrote: > David Rysdam writes: > >effective_cache_size - 12000MB > >shared_buffers - 1024MB > >random_page_cost - is commented out > >cpu_tuple_cost - commented out > >work_mem - commented out > > > I assume you guys already know the default values for those last 3 on a > > 9.0.x server... > > Default work_mem is only 1MB, so that probably explains why you're not > getting a hashed subplan here. Have them knock it up some, say on the > order of 10MB. (If none of your queries are any more complicated than > this one, you could go higher. But keep in mind that a backend can use > work_mem per sort/hash/materialize step, not per query --- so complex > queries can use many times work_mem. Multiply that by the number of > backends, and you can end up in swap hell pretty quickly with an over > optimistic value.) We deliberately try to keep our queries fairly simple for several reasons. This isn't the most complicated, but they don't get much more than this. I'll have them start with 10MB and see what they get. smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] nested query vs left join: query planner very confused
On Wed, 27 Nov 2013 11:06:51 -0500, bricklen wrote: > Has the client ANALYZEd recently? What happens if the client issues > the following commands before executing the query? > VACUUM ANALYZE lp.sigs; > VACUUM ANALYZE lp.mags; > > If that doesn't change the plan, could you post the values for > effective_cache_size, shared_buffers, random_page_cost, > cpu_tuple_cost, work_mem and how much RAM is in the client machine? Yes, I did have them do a vacuum analyze with no result. Here's their reply on configuration: It is 24 Gig. effective_cache_size - 12000MB shared_buffers - 1024MB random_page_cost - is commented out cpu_tuple_cost - commented out work_mem - commented out I assume you guys already know the default values for those last 3 on a 9.0.x server... smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] nested query vs left join: query planner very confused
On Wed, 27 Nov 2013 11:21:09 -0500, Tom Lane wrote: > DBMS-agnostic frequently means "sucks equally badly on all DBMSes" :-( We've generally been OK (cf the ~50ms runtime for the same query at our site), but we also notice problems sooner than our client sometimes does and can make algorithm improvements where we don't know how to make DB ones. > This query is hard to optimize because of the weird behavior of NOT IN > when nulls are involved. Since you aren't complaining that the query > fails entirely, I'm supposing that lp.Sigs.signum contains no nulls, > but the planner doesn't know that. If you can transform it to a NOT > EXISTS, you'll likely get a much better plan: > > select signum from lp.Mags where signum is not null and > not exists (select 1 from lp.Sigs where lp.Sigs.signum = lp.Mags.signum) We've already shipped to the client, but I'm looking at how extensive a patch would have to be. Very surprising we haven't hit this issue before. > What you want is an "anti join" plan, or at least a plan that mentions > a "hashed subplan". Plain subplans are death performance-wise, because > they amount to being nestloop joins rather than anything smarter. (In > this case it's likely not choosing a hashed subplan because work_mem is > too small to allow that.) I've got the client looking for this and other values already. We'll soon know... smime.p7s Description: S/MIME cryptographic signature
[GENERAL] nested query vs left join: query planner very confused
I've got two tables, sigs and mags. It's a one-to-one relationship, mags is just split out because we store a big, less-often-used field there. "signum" is the key field. Sometimes I want to know if I have any orphans in mags, so I do a query like this: select signum from lp.Mags where signum is not null and signum not in (select lp.Sigs.signum from lp.Sigs) (I do this as a subquery because we originally had a old Sybase DB where outer joins were a non-standard pain and this way works the same and is DBMS-agnostic.) At my location, this query runs very fast (~50ms on a ~100k row table) and 'explain' shows a plan with this structure: Seq scan on mags Filter: SubPlan 1 Seq scan on sigs At my client's location, the query is very slow (same table size, similar hardware/config, although they are running 9.0.x and I'm on 9.2.x). By "slow" I mean like an *hour*. 'explain' has this structure: Seq scan on mags Filter: SubPlan 1 Materialize Seq scan on sigs I'd never heard of Materialize before, so I looked into it. Seems to make a virtual table of the subquery so repetitions of the parent query don't have to re-do the work. Sounds like it should only help, right? The client's 'explain analyze' shows this: Seq Scan on mags (cost=0.00..187700750.56 rows=47476 width=4) (actual time=3004851.889..3004851.889 rows=0 loops=1) Filter: ((signum IS NOT NULL) AND (NOT (SubPlan 1))) SubPlan 1 -> Materialize (cost=0.00..3713.93 rows=95862 width=4) (actual time=0.011..16.145 rows=48139 loops=94951) -> Seq Scan on sigs (cost=0.00..2906.62 rows=95862 width=4) (actual time=0.010..674.201 rows=95862 loops=1) Total runtime: 3004852.005 ms If I'm reading this correctly, the Materialize is running ~95k times, taking the majority of the time. Why? The only thing I can think of is this scenario: 1) server thinks it has a LOT of RAM 2) decides to Materialize subquery to take advantage 3) machine does not actually have that RAM, so it gets swapped 4) server notices it was swapped and decides to re-run rather than unswap 5) goto 2 I don't know if that's a realistic scenario, but it's all I got. I'm already well into unknown territory, performance-tuning-wise. I also decided to try doing the query a different way: select lp.mags.signum from lp.mags left join lp.sigs on lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null and lp.sigs.signum is null; This one runs fast for both of us. So I guess my second question is: why can't the query planner tell these are the same query? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] having difficulty with explain analyze output
On Tue, 26 Nov 2013 14:51:22 -0500, Martijn van Oosterhout wrote: > The Seq Scan took 674ms and was run once (loops=1) > > The Materialise was run 94951 times and took, on average, 0.011ms to > return the first row and 16ms to complete. > > 16.145 * 94951 = 1532983.895 OK, this is helpful. But why would Materialize run 94k times? I thought the whole point of Materialize was to make a "virtual table" that could speed up "parent" queries. If it has to recreate the subquery, why would the planner choose to Materialize? smime.p7s Description: S/MIME cryptographic signature
[GENERAL] help interpreting "explain analyze" output
I'm not really looking for information on how to speed this query up. I'm just trying to interpret the output enough to tell me which step is slow: Seq Scan on mags (cost=0.00..187700750.56 rows=47476 width=4) (actual time=3004851.889..3004851.889 rows=0 loops=1) Filter: ((signum IS NOT NULL) AND (NOT (SubPlan 1))) SubPlan 1 -> Materialize (cost=0.00..3713.93 rows=95862 width=4) (actual time=0.011..16.145 rows=48139 loops=94951) -> Seq Scan on sigs (cost=0.00..2906.62 rows=95862 width=4) (actual time=0.010..674.201 rows=95862 loops=1) Total runtime: 3004852.005 ms It looks like the inner seq scan takes 674ms, then the materialize takes an additional 16ms? Or is that 16ms * 94951? Or 674 * 94951? And the outer seq scan takes 3004851-3004851 = 0ms? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] having difficulty with explain analyze output
I'm not really looking for information on how to speed this query up. I'm just trying to interpret the output enough to tell me which step is slow: Seq Scan on mags (cost=0.00..187700750.56 rows=47476 width=4) (actual time=3004851.889..3004851.889 rows=0 loops=1) Filter: ((signum IS NOT NULL) AND (NOT (SubPlan 1))) SubPlan 1 -> Materialize (cost=0.00..3713.93 rows=95862 width=4) (actual time=0.011..16.145 rows=48139 loops=94951) -> Seq Scan on sigs (cost=0.00..2906.62 rows=95862 width=4) (actual time=0.010..674.201 rows=95862 loops=1) Total runtime: 3004852.005 ms It looks like the inner seq scan takes 674ms, then the materialize takes an additional 16ms? Or is that 16ms * 94951? Or 674 * 94951? And the outer seq scan takes 3004851-3004851 = 0ms? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] partitioning for speed, but query planner ignores
On Wed, 2 Oct 2013 11:19:58 -0400, Kevin Grittner wrote: > David Rysdam wrote: > > > We have a by-our-standards large table (about 40e6 rows). Since it is > > the bottleneck in some places, I thought I'd experiment with > > partitioning. > > In my personal experience I have gone into hundreds of millions of > rows with good performance without partitioning. It's all about > designing good indexes for the workload. Well, our performance is still good. Certainly better than a lot of projects I've seen even with less data. But it's still our "worst" table and I have some free time to experiment... > > Because object number figure so prominently, I thought I'd partition on > > that. To me, it makes the most sense from a load-balancing perspective > > Load balancing? Hitting a single partition more heavily improves > your cache hit ratio. What sort of benefit are you expecting from > spreading the reads across all the partitions? *Maybe* that could > help if you carefully placed each partition table on a separate set > of spindles, but usually you are better off having one big RAID so > that every partition is spread across all the spindles > automatically. Now that you spell it out, I guess that does make more sense. I had some vague notion of tables "doing work" but really if it can load one partition into RAM and get most of my hits from there, it'd be a big win. > > Lower numbers are going to be queried much less often than higher > > numbers. > > This suggests to me that you *might* get a performance boost if you > define partitions on object number *ranges*. It still seems a bit > dubious, but it has a chance. Would the planner be smart enough to figure out ranges without me having to "hint" my queries? In any case, my speed tests are coming out the opposite what I expected. Within-partition queries are taking longer than the whole table did while across-partition queries are faster. I'll have to do more thinking on that. smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] partitioning for speed, but query planner ignores
On Wed, 2 Oct 2013 09:12:02 -0400, Bill Moran wrote: > Last I looked, the partitioning mechanism isn't _quite_ as smart as could > be desired. For example: > SELECT * FROM table WHERE objnum = 5; -- will not take advantage of partition > You have to give the planner a little more hint as to the fact that it can > take advantage of the partition: > SELECT * FROM table WHERE (objnum % 2) = 1 AND objnum = 5; > As silly as it seems, this is enough information for the planner to know > that it only needs to scan one partition. This seemed ridiculously silly until I thought about it. I guess it has no way of "unwrapping" my constraint and figuring out what to do. Would this also apply if I did ranges or is that a common enough constraint that it *can* figure it out without me having to modify all my queries? smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] partitioning for speed, but query planner ignores
On Wed, 2 Oct 2013 08:34:44 -0400, David Rysdam wrote: > However, when I run an explain or an explain analyze, I still seeing it > checking both partitions. Is this because the query planner doesn't want > to do a mod? Should I go with simple ranges, even though this adds a > maintenance task? I guess I should give some administrivia as well: Server is 9.2.1 running Linux. The configuration is otherwise pretty vanilla with only minor, and poorly-understood, conf changes. smime.p7s Description: S/MIME cryptographic signature
[GENERAL] partitioning for speed, but query planner ignores
We have a by-our-standards large table (about 40e6 rows). Since it is the bottleneck in some places, I thought I'd experiment with partitioning. I'm following the instructions here: http://www.postgresql.org/docs/current/static/ddl-partitioning.html The table holds data about certain objects, each of which has an object number and some number of historical entries (like account activity at a bank, say). The typical usage pattern is: relatively rare inserts that happen in the background via an automated process (meaning I don't care if they take a little longer) and frequent querying, including some where a human is sitting in front of it (i.e. I'd like it to be a lot faster). Our most frequent queries either select "all history for object N" or "most recent item for some subset of objects". Because object number figure so prominently, I thought I'd partition on that. To me, it makes the most sense from a load-balancing perspective to partition on the mod of the object number (for this test, evens vs odds, but planning to go up to mod 10 or even mod 100). Lower numbers are going to be queried much less often than higher numbers. This scheme also means I never have to add partitions in the future. I set up my check constraints ((objnum % 2) = 0 and (objnum % 2) = 1 on the relevant tables) and turned constraint_exclusion to 'partition' in postgresql.conf. I also turned it to 'on' in my psql interface. However, when I run an explain or an explain analyze, I still seeing it checking both partitions. Is this because the query planner doesn't want to do a mod? Should I go with simple ranges, even though this adds a maintenance task? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] prepared statement results don't clear?
Michael Fuhr wrote: On Tue, Jan 17, 2006 at 03:37:14PM -0500, David Rysdam wrote: I have a Tcl function that does this: 1) create prepared statement for binary insertion via pg_exec (and releases the result handle) 2) run statement with binary data via pg_exec_prepared (and releases the result handle) 3) deallocate statement via pg_exec (and releases the result handle) When I try to run this function a couple hundred times, I get "had limit on result handles reached" after 128 successes. It seems something is not being released. Can you post a simple test case? I can provoke "hard limit on result handles reached" by not clearing results, but so far I haven't seen that error if I do clear all results. Sorry, I was wrong. I had a connection leak that was hidden because my accounting procedure and the actual code made different assumptions. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] prepared statement results don't clear?
I have a Tcl function that does this: 1) create prepared statement for binary insertion via pg_exec (and releases the result handle) 2) run statement with binary data via pg_exec_prepared (and releases the result handle) 3) deallocate statement via pg_exec (and releases the result handle) When I try to run this function a couple hundred times, I get "had limit on result handles reached" after 128 successes. It seems something is not being released. To make absolutely sure it's not me that's leaving something out there, I output a line each time I either create or destroy a result handle and they add up perfectly. Furthermore, all the pg_execs go through another function that has been well-exercised, so I don't think the problem is there. The only thing I can think of is that a prepared statement (or the Tcl specific pg_exec_prepared) has, like, a "double" result handle (one for the statement itself and one for the exec'ing thereof). Kind of a half-assed theory, but necessity is the mother of such invention. Does anyone else have any better ideas for locating the result handle leak? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] EXPLAIN ANALYZE output
I'm afraid I still don't understand the output of EXPLAIN ANALYZE. The output I get is like this: blah blah blah (cost=A..B blah blah) (actual time=C..D blah blah) According to what I've been able to find: A = predicted time to first returned row in milliseconds B = total cost in arbitrary units C = actual time to first returned row in milliseconds D = actual total time in milliseconds 1) Is this correct? I had a query that was taking a long time and I spent the last day and a half trying to reduce D. I probably should have been working on reducing C, but that's neither here nor there. Nothing I've done has had any appreciable effect, with both C and D staying around 17-20 (~= 3 minutes), including dropping the sort condition. After all this head-on-wall banging, I went back and tried doing the query without EXPLAIN ANALYZE. It takes under a minute, even with the sort turned on. 2) Does EXPLAIN ANALYZE add a lot of overhead that is being measured here? 3) Even if so, why has the output of EXPLAIN ANALYZE not dropped even though the query itself is now faster (due to something I did while working on C/D?)? There's been plenty of vacuuming and analyzing on these tables, so outdated stats are not the answer. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] reading EXPLAIN output
Tom Lane wrote: David Rysdam <[EMAIL PROTECTED]> writes: merge join (cost=0.00..348650.65 rows=901849 width=12) merge cond {blah} join filter {blah} index scan using {blah index on blah} (cost=0.00..289740.65 rows=11259514 width=8) index scan using {blah index on blah} (cost=0.00..17229.93 rows=902085 width=8) This query takes about 3 minutes to run and I'm trying to figure out why. From a tutorial and the docs, I gather that the "..largenum" part is the number of page reads required, so I understand where 289740 and 17229 come from. But what about 348650 page reads for the "merge join"? You're misreading it. An upper node's cost includes the cost of its children. So the actual cost estimate for the join step is 41680.07. When I do EXPLAIN ANALYZE, the actual values come out like this: merge join: (actual time=170029.404..170029.404) That seems a bit odd ... is there only one row produced? Could you show us the entire EXPLAIN ANALYZE output, rather than your assumptions about what's important? Increasing work_mem won't help a merge join, but if you can get it large enough to allow a hash join to be used instead, that might be a win. regards, tom lane I'm looking for certain anomalies, so the end result should be zero rows. merge join (cost=0.00..348650.65 rows=901849 width=12) (actual time=170029.404..170029.404 rows=0 loops=1) merge cond {blah} join filter {blah} index scan using {blah index on blah} (cost=0.00..289740.65 rows=11259514 width=8) (actual time=29.227..85932.426 rows=11256725 loops=1) index scan using {blah index on blah} (cost=0.00..17229.93 rows=902085 width=8) (actual time=39.896..6766.755 rows=902236 loops=1) Total runtime: 172469.209 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] reading EXPLAIN output
merge join (cost=0.00..348650.65 rows=901849 width=12) merge cond {blah} join filter {blah} index scan using {blah index on blah} (cost=0.00..289740.65 rows=11259514 width=8) index scan using {blah index on blah} (cost=0.00..17229.93 rows=902085 width=8) This query takes about 3 minutes to run and I'm trying to figure out why. From a tutorial and the docs, I gather that the "..largenum" part is the number of page reads required, so I understand where 289740 and 17229 come from. But what about 348650 page reads for the "merge join"? My conjecture is that the joined keys are being stored on disk (if that's how the internals of postgresql works) and have to be re-read for the rest of the query. Is that right? Does that mean I could speed this up by giving more RAM to store it in? When I do EXPLAIN ANALYZE, the actual values come out like this: merge join: (actual time=170029.404..170029.404) index scan: (actual time=27.653..84373.805) index scan: (actual time=45.681..7026.928) This seems to confirm that it's the final "merge join" that takes forever. Because it is writing to and reading from disk? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] "incomplete startup packet" on SGI
David Rysdam wrote: David Rysdam wrote: Tom Lane wrote: David Rysdam <[EMAIL PROTECTED]> writes: Just finished building and installing on *Sun* (also "--without-readline", not that I think that could be the issue): Works fine. So it's something to do with the SGI build in particular. More likely it's something to do with weird behavior of the SGI kernel's TCP stack. I did a little googling for "transport endpoint is not connected" without turning up anything obviously related, but that or ENOTCONN is probably what you need to search on. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster It's acting like a race condition or pointer problem. When I add random debug printfs/PQflushs to libpq it sometimes works. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Not a race condition: No threads Not a memory leak: Electric fence says nothing. And it works when electric fence is running, whereas a binary that uses the same libpq without linking efence does not work. I know nobody is interested in this, but I think I should document the "solution" for anyone who finds this thread in the archives: My theory is that Irix is unable to keep up with how fast the postgresql client is going and that the debug statements/efence stuff are slowing it down enough that Irix can catch up and make sure the socket really is there, connected and working. To that end, I inserted a sleep(1) in fe-connect.c just before the pqPacketSend(...startpacket...) stuff. It's stupid and hacky, but gets me where I need to be and maybe this hint will inspire somebody who knows (and cares) about Irix to find a real fix. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] "incomplete startup packet" on SGI
David Rysdam wrote: Tom Lane wrote: David Rysdam <[EMAIL PROTECTED]> writes: Just finished building and installing on *Sun* (also "--without-readline", not that I think that could be the issue): Works fine. So it's something to do with the SGI build in particular. More likely it's something to do with weird behavior of the SGI kernel's TCP stack. I did a little googling for "transport endpoint is not connected" without turning up anything obviously related, but that or ENOTCONN is probably what you need to search on. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster It's acting like a race condition or pointer problem. When I add random debug printfs/PQflushs to libpq it sometimes works. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Not a race condition: No threads Not a memory leak: Electric fence says nothing. And it works when electric fence is running, whereas a binary that uses the same libpq without linking efence does not work. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] "incomplete startup packet" on SGI
Tom Lane wrote: David Rysdam <[EMAIL PROTECTED]> writes: Just finished building and installing on *Sun* (also "--without-readline", not that I think that could be the issue): Works fine. So it's something to do with the SGI build in particular. More likely it's something to do with weird behavior of the SGI kernel's TCP stack. I did a little googling for "transport endpoint is not connected" without turning up anything obviously related, but that or ENOTCONN is probably what you need to search on. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster It's acting like a race condition or pointer problem. When I add random debug printfs/PQflushs to libpq it sometimes works. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] "incomplete startup packet" on SGI
Just finished building and installing on *Sun* (also "--without-readline", not that I think that could be the issue): Works fine. So it's something to do with the SGI build in particular. David Rysdam wrote: I have a working 8.1 server running on Linux and I can connect to it from other Linux clients. I built postgresql 8.1 on an SGI (using --without-readline but otherwise stock) and it compiled OK and installed fine. But when I try to connect to the Linux server I get "could not send startup packet: transport endpoint is not connected" on the client end and "incomplete startup packet" on the server end. Connectivity between the two machines is working. I could find basically no useful references to the former and the only references to the latter were portscans and the like. Browsing the source, I see a couple places that message could come from. One relates to SSL, which the output from configure says is turned off on both client and server. The other is just a generic comm error--but would could cause a partial failure like that? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] "incomplete startup packet" on SGI
I have a working 8.1 server running on Linux and I can connect to it from other Linux clients. I built postgresql 8.1 on an SGI (using --without-readline but otherwise stock) and it compiled OK and installed fine. But when I try to connect to the Linux server I get "could not send startup packet: transport endpoint is not connected" on the client end and "incomplete startup packet" on the server end. Connectivity between the two machines is working. I could find basically no useful references to the former and the only references to the latter were portscans and the like. Browsing the source, I see a couple places that message could come from. One relates to SSL, which the output from configure says is turned off on both client and server. The other is just a generic comm error--but would could cause a partial failure like that? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] missing something obvious about intervals?
I knew it had to be something like this, but the search space was just too big. Thanks! Jim Buttafuoco wrote: try select '2005-12-01'::date + (456.5::float || ' seconds')::interval; ?column? 2005-12-01 00:07:36.50 (1 row) -- Original Message ------- From: David Rysdam <[EMAIL PROTECTED]> To: "pg >> Postgres General" Sent: Mon, 12 Dec 2005 13:00:13 -0500 Subject: [GENERAL] missing something obvious about intervals? I have a table that has a date field and a "real" field that represents a number of seconds. I want select the date field + the seconds field. Just adding it doesn't work. Casting to interval doesn't work. to_date/to_timestamp don't work. How do I do this? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq --- End of Original Message --- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] missing something obvious about intervals?
I have a table that has a date field and a "real" field that represents a number of seconds. I want select the date field + the seconds field. Just adding it doesn't work. Casting to interval doesn't work. to_date/to_timestamp don't work. How do I do this? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] date format
David Rysdam wrote: David Rysdam wrote: I could swear (but I don't think I can prove at this point) that 8.0 beta3 returned timestamps with milliseconds, like this: -MM-DD HH24:MI:SS.MS But 8.1 isn't doing that. I see functions to format the date, but that would require me to change all my existing SQL to specifically ask for the milliseconds. There's also the "set datestyle" thing, but the granularity doesn't exist to specify an exact format such as the above. Is there a way to tell postgres *exactly* what I want the *default* date output format to be? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Nevermind, the documentation just seems to be unclear. I do get .MS back in ISO format (despite that not being shown as part of the ISO format). It's just that none of my dates have milliseconds on them, which is simply a bug I'll have to find. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly OK, sorry to flood on this, but there is still a problem. If I have no milliseconds, I'd like to get back ".000" at the end of the output. Since MS are indeed part of the ISO format, why don't get I get zeros? Alternatively, if MS aren't part of the ISO format, why do I get non-zeros? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] date format
David Rysdam wrote: I could swear (but I don't think I can prove at this point) that 8.0 beta3 returned timestamps with milliseconds, like this: -MM-DD HH24:MI:SS.MS But 8.1 isn't doing that. I see functions to format the date, but that would require me to change all my existing SQL to specifically ask for the milliseconds. There's also the "set datestyle" thing, but the granularity doesn't exist to specify an exact format such as the above. Is there a way to tell postgres *exactly* what I want the *default* date output format to be? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Nevermind, the documentation just seems to be unclear. I do get .MS back in ISO format (despite that not being shown as part of the ISO format). It's just that none of my dates have milliseconds on them, which is simply a bug I'll have to find. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] date format
I could swear (but I don't think I can prove at this point) that 8.0 beta3 returned timestamps with milliseconds, like this: -MM-DD HH24:MI:SS.MS But 8.1 isn't doing that. I see functions to format the date, but that would require me to change all my existing SQL to specifically ask for the milliseconds. There's also the "set datestyle" thing, but the granularity doesn't exist to specify an exact format such as the above. Is there a way to tell postgres *exactly* what I want the *default* date output format to be? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] to Jerry LeVan
I got your email about pgbrowse and it has guided me well to using cursors from pgtcl. Thanks! I might have a question or two--do you have an email address that DNS can resolve? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Very slow queries on 8.1
Michael Fuhr wrote: On Thu, Nov 17, 2005 at 10:38:50AM -0500, David Rysdam wrote: I'm porting an application from Sybase and I've noticed that similar application functions take 2 to 3 times longer on postgres than they used to on the same machine running under Sybase. I've tried changing various "performance tuning" parameters, such as shared_buffers, effective_cache_size, etc but there's little or no effect. What's your OS and hardware environment and what are your non-default PostgreSQL settings? Fedora Core 2, dual 2.8 GHz, 2 GB ram. shared_buffers = 1 effective_cache_size = 10 Right now, I'm working on a test case that involves a table with ~360k rows called "nb.sigs". My sample query is: select * from nb.sigs where signum > 25 With no index, explain says this query costs 11341. After CREATE INDEX on the signum field, along with an ANALYZE for nb.sigs, the query costs 3456 and takes around 4 seconds to return the first row. Please post the EXPLAIN ANALYZE output for the query -- that'll show us the query plan, the accuracy of the planner's row count estimate, and how long the query takes to execute on the server. It might also be useful to see the table definition and the output of the following query: SELECT null_frac, n_distinct, correlation FROM pg_stats WHERE schemaname = 'nb' AND tablename = 'sigs' AND attname = 'signum'; The first things would be problematic to supply, since they are actually on a computer that doesn't have access to the Internet or to the machine I'm writing this on. As for the query: Row null_frac n_distinct correlation 10 -1 1 What client interface are you using? If the query returns a lot of rows then you might benefit from using a cursor to fetch the result set in chunks; otherwise the client library is probably waiting for the entire result set to arrive before returning any rows to you. That does seem to be the problem. I've never worked with cursors, so I'll have to see if I can fit our DB module into that mold. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Very slow queries on 8.1
Martijn van Oosterhout wrote: On Thu, Nov 17, 2005 at 11:31:27AM -0500, David Rysdam wrote: Right, it's about 100k rows and it is through libpq (pgadmin in this case, but my app uses libpq from pgtcl). Is there a way to tell libpq to not do what it "likes" and do what I need instead? I didn't see anything in the docs, but I didn't look very hard. Use the async interface. There you submit the query and retrieve rows as they come in. It's a bit trickier to program but it can be done. psql doesn't do this though, it's not clear how it could anyway, given the way it formats. Have a nice day, I'm experimenting with the async interface right now. Hopefully it will fit in well. It's OK if psql/pgadmin don't do it, as long as the app does. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Very slow queries on 8.1
Bruno Wolff III wrote: On Thu, Nov 17, 2005 at 11:31:27 -0500, David Rysdam <[EMAIL PROTECTED]> wrote: Right, it's about 100k rows and it is through libpq (pgadmin in this case, but my app uses libpq from pgtcl). Is there a way to tell libpq to not do what it "likes" and do what I need instead? I didn't see anything in the docs, but I didn't look very hard. You could use a cursor. That will bias the plan toward fast start plans which might give you lower throughput if you are normally planning to fetch all of the rows, but will give you quicker access to the first row. That is exactly what I want, but is it possible to use a cursor from pgtcl? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Very slow queries on 8.1
Tom Lane wrote: David Rysdam <[EMAIL PROTECTED]> writes: Right now, I'm working on a test case that involves a table with ~360k rows called "nb.sigs". My sample query is: select * from nb.sigs where signum > 25 With no index, explain says this query costs 11341. After CREATE INDEX on the signum field, along with an ANALYZE for nb.sigs, the query costs 3456 and takes around 4 seconds to return the first row. This seems extremely slow to me, but I can't figure out what I might be doing wrong. Any ideas? How many rows does that actually return, and what client interface are you fetching it with? libpq, at least, likes to fetch the entire query result before it gives it to you --- so you're talking about 4 sec to get all the rows, not only the first one. That might be reasonable if you're fetching 100k rows via an indexscan... regards, tom lane Right, it's about 100k rows and it is through libpq (pgadmin in this case, but my app uses libpq from pgtcl). Is there a way to tell libpq to not do what it "likes" and do what I need instead? I didn't see anything in the docs, but I didn't look very hard. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Very slow queries on 8.1
int4, not null and the index is unique. I even tried clustering on it to no avail. codeWarrior wrote: What is the data type for "signum" ??? "David Rysdam" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] I'm porting an application from Sybase and I've noticed that similar application functions take 2 to 3 times longer on postgres than they used to on the same machine running under Sybase. I've tried changing various "performance tuning" parameters, such as shared_buffers, effective_cache_size, etc but there's little or no effect. I'm beginning to think there's a deeper root cause to the slowness. Right now, I'm working on a test case that involves a table with ~360k rows called "nb.sigs". My sample query is: select * from nb.sigs where signum > 25 With no index, explain says this query costs 11341. After CREATE INDEX on the signum field, along with an ANALYZE for nb.sigs, the query costs 3456 and takes around 4 seconds to return the first row. This seems extremely slow to me, but I can't figure out what I might be doing wrong. Any ideas? (If necessary, I can write an entire script that creates and populates a table and then give my performance on that sample for someone else to check against.) ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Very slow queries on 8.1
I'm porting an application from Sybase and I've noticed that similar application functions take 2 to 3 times longer on postgres than they used to on the same machine running under Sybase. I've tried changing various "performance tuning" parameters, such as shared_buffers, effective_cache_size, etc but there's little or no effect. I'm beginning to think there's a deeper root cause to the slowness. Right now, I'm working on a test case that involves a table with ~360k rows called "nb.sigs". My sample query is: select * from nb.sigs where signum > 25 With no index, explain says this query costs 11341. After CREATE INDEX on the signum field, along with an ANALYZE for nb.sigs, the query costs 3456 and takes around 4 seconds to return the first row. This seems extremely slow to me, but I can't figure out what I might be doing wrong. Any ideas? (If necessary, I can write an entire script that creates and populates a table and then give my performance on that sample for someone else to check against.) ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] I must not understand the permissions system
From both User A and User B, this query returns rows: select * from pg_trigger where tgname = '' but User A gets 2 rows (because I have the same trigger name on two different tables) while User B only sees one of them. Is it permissions on pg_trigger or on the original table (or some other thing) that is preventing B from seeing that row? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] [Fwd: I must not understand the permissions system]
Wow, nevermind. What I must not understand is my own code. Yikes, what a bonehead. --- Begin Message --- From both User A and User B, this query returns rows: select * from pg_trigger where tgname = '' but User A gets 2 rows (because I have the same trigger name on two different tables) while User B only sees one of them. Is it permissions on pg_trigger or on the original table (or some other thing) that is preventing B from seeing that row? --- End Message --- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Check for existence of index
Tom Lane wrote: David Rysdam <[EMAIL PROTECTED]> writes: Why can't I have the same index name be on different tables? You can ... if they are in different schemas. Indexes and tables share the same namespace, ie, they must be unique within a schema. As for your original question, you probably want something like SELECT ... FROM pg_class c, pg_namespace n WHERE c.relnamespace = n.oid AND relname = 'indexname' AND nspname = 'schemaname' AND relkind = 'i'; If you actually want to verify that this index is on a specific table, you'll need a more complicated join involving pg_index and a second scan of pg_class. See http://www.postgresql.org/docs/8.0/static/catalogs.html regards, tom lane Well, since I can't have more than one index of a given name in a schema anyway, I'll have to name them "$tablename_$indexname" or something, which means I won't have to verify they are on a particular table. Anyway, this query looks good. I was getting lost in all the terminology ("namespace" vs "schema") data distributed all over (some stuff in pg_index, some in pg_class, etc). ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Check for existence of index
And another thing, can't I do this: create table s.a (blah); create table s.b (blah); create index myindex on s.a(blah); create index myindex on s.b(blah); ? When I drop them I have to specify the schema name, so presumably it tracks them that way. Why can't I have the same index name be on different tables? David Rysdam wrote: I have a script that automatically creates my database objects. In order to automatically create indexes, it needs to first make sure they don't exist. For things like tables, this is easy: select * from information_schema.tables where table_schema = "" and table_name = "" But for indexes it is hard for some reason. There's a catalog table "pg_index", but it doesn't have index, schema or table names. I eventually found them in pg_class but the table and schema names aren't there. After some searching around, I came across this very strange (to me, anyway) "::regclass" thing that let me do this: select * from pg_catalog.pg_index where indexrelid = 'schema.index'::regclass I'm not really clear what's that doing, but in any case it still isn't what I want. That query returns information when the index exists but errors out when the index doesn't exist. Is there a way I can get a non-erroring query on either condition that will tell me if an index exists on a given table in a given schema? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] Check for existence of index
I have a script that automatically creates my database objects. In order to automatically create indexes, it needs to first make sure they don't exist. For things like tables, this is easy: select * from information_schema.tables where table_schema = "" and table_name = "" But for indexes it is hard for some reason. There's a catalog table "pg_index", but it doesn't have index, schema or table names. I eventually found them in pg_class but the table and schema names aren't there. After some searching around, I came across this very strange (to me, anyway) "::regclass" thing that let me do this: select * from pg_catalog.pg_index where indexrelid = 'schema.index'::regclass I'm not really clear what's that doing, but in any case it still isn't what I want. That query returns information when the index exists but errors out when the index doesn't exist. Is there a way I can get a non-erroring query on either condition that will tell me if an index exists on a given table in a given schema? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] psql : how to make it more silent....
Tom Lane wrote: David Rysdam <[EMAIL PROTECTED]> writes: ... I would like to have psql (optionally?) not even send me NOTICE messages. Have you looked at client_min_messages? regards, tom lane I had not, because I'd never heard of it. :) Looks like exactly what I want and what I was suspecting existed, thanks. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] tcl bindings for 8.0
David Rysdam wrote: The README from 8.0-beta3 says "This distribution also contains several language bindings, including C and Tcl" but I'm not finding libpgtcl being built, nor can I find a way to tell it to. I see pgtcl is on http://gborg.postgresql.org, so this mean that the README is out of date and I need to download a separate piece? Does it currently work with beta3 given that the proejct says "Enhancements will include converting all commands to use Tcl 8-style objects"? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) Oh, *Tcl* 8 objects. I was reading that as PostgreSQL 8 at first. Still, my question stands. Is libpgtcl in 8.0-beta3 and I'm missing it or do I need to download? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[GENERAL] tcl bindings for 8.0
The README from 8.0-beta3 says "This distribution also contains several language bindings, including C and Tcl" but I'm not finding libpgtcl being built, nor can I find a way to tell it to. I see pgtcl is on http://gborg.postgresql.org, so this mean that the README is out of date and I need to download a separate piece? Does it currently work with beta3 given that the proejct says "Enhancements will include converting all commands to use Tcl 8-style objects"? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] psql : how to make it more silent....
Gaetano Mendola wrote: Patrick Fiche wrote: Hi, When I execute a function, I would like psql to show me only RAISE NOTICE messages but not all function calls Indeed, I currently get some messages that I don't care about : * PL/pgSQL function "adm_user" line 321.. * CONTEXT: SQL query "SELECT." Is there a way to get rid of these messages modify your log_error_verbosity to "terse" I can't find anything else on that http://search.postgresql.org/www.search?ul=http%3A%2F%2Fwww.postgresql.org%2Fdocs%2F7.4%2Fstatic%2F%25&q=log_error_verbosity and I would like to have psql (optionally?) not even send me NOTICE messages. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Date format for bulk copy
Michael Fuhr wrote: On Wed, Oct 13, 2004 at 01:32:01PM -0400, David Rysdam wrote: Michael Fuhr wrote: You could filter the data through a script that reformats certain fields, then feed the reformatted data to PostgreSQL. This is usually a trivial task for Perl, awk, sed, or the like. Right, I *can* do this. But then I have to build knowledge into that script so it can find each of these date fields (there's like 20 of them across 10 different files) and then update that knowledge each time it changes. In your case that's a reasonable argument against filtering the data with a script. Using a regular expression in the script might reduce or eliminate the need for some of the logic, but then you'd run the risk of reformatting data that shouldn't have been touched. I'm still leaning towards just making postgres accept at ':' delimiter for milliseconds. Based on your requirements, that might indeed be a better solution. I'd probably choose to extend PostgreSQL rather than hack what already exists, though. Doing the latter might break something else and you have to remember to add the hack every time you upgrade the server software. That can cause headaches for whoever inherits the system from you unless it's well-documented. By "extend PostgreSQL" do you mean create a custom input_function for timestamp? Are there docs that give hints for replacing the input function of an existing type? Someone else replied similarly, but I'm afraid I'm not familiar enough with PG to decipher it all. Why not the user-defined type with associated user-defined input function? If filtering the data is awkward, then that might be a better way to go. I think I will, when I get to that point. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] Date format for bulk copy
Greg Stark wrote: David Rysdam <[EMAIL PROTECTED]> writes: In my brute force port, I just bulk copied the date fields into temporary tables and then did a to_timestamp(field, 'Mon DD HH:MI:SS:MSAM'). Again, I created a temporary table and did a decode(field, 'hex') to the real table. This is the standard approach. You're rather lucky these are the only data representation changes you've had to do so far. I fear you'll run into more and more complex changes over time and trying to avoid the temporary table will get harder and harder. No, I think I'm OK there. These are programmatically-generated values and I've already been through them all once. Just the millisecond issue and the hex binary issue AFAIK. If it were me I would consider processing the files in perl. It should be pretty easy to do both of these modifications very quickly. Very quick and easy to do one time. A little trickier to handle in an elegant, maintainable way for the dozens of data reloads I do every month for GBs of data onto two different server types. If you really want to go with a custom C code then you might be able to just grab the byteain/byteaout functions from src/backend/util/adt/varlena into a separate module and create new functions with modified names. Load it with CREATE FUNCTION byteain ... AS 'my_bytea_funcs.so' 'my_byteain'; Or maybe create the function as my_byteain in postgres and then update the catalog entries somehow. I'm not sure how to do that but it shouldn't be too hard. And it might make it easier to do the substitution for the data load and then undo the change afterwards. Why not create a type and then define the load function to be the equivalent of "decode('hex')"? Doing the same for timmestamp is a bit trickier but you could copy ParseDateTime from datetime.c as a static function for your module. Be careful though, test this out thoroughly on a test database. I'm not sure of all the impacts of altering the in/out functions for data types. I expect it would break pg_dump, for example. And I would worry about the statistics tables too. This is kind of a hybrid of my suggestions and the problems are a hybrid as well. :) 1) Just change the timestamp type so that it allows a ':' delimiter for milliseconds. Potential problems: Other parts of the code won't expect it. People don't want that. 2) Create a new type. Potential problem: Things like date ranges probably wouldn't work anymore, since the server wouldn't know it's a date now. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Date format for bulk copy
Michael Fuhr wrote: On Wed, Oct 13, 2004 at 10:06:58AM -0400, David Rysdam wrote: Sybase bulk copies the date fields out in this format: Mar 4 1973 10:28:00:000AM Postgresql's COPY (or psql \copy) doesn't like that format. You could filter the data through a script that reformats certain fields, then feed the reformatted data to PostgreSQL. This is usually a trivial task for Perl, awk, sed, or the like. Right, I *can* do this. But then I have to build knowledge into that script so it can find each of these date fields (there's like 20 of them across 10 different files) and then update that knowledge each time it changes. I'm still leaning towards just making postgres accept at ':' delimiter for milliseconds. Also, how much would a secondary script slow down the bulk copy, if any? I have a similarish problem with another field type. In Sybase it's a binary format. In postgres it is a binary format (bytea). But Sybase bcps the data out in ASCII. Sybase recognizes that when it is a binary field and auto-converts the ASCII back to binary. Postgres doesn't. Again, I created a temporary table and did a decode(field, 'hex') to the real table. Sounds like Sybase is dumping in hex, whereas PostgreSQL expects octal. If you can't change the dump format, then again, filtering the data through a script might work. Oh, so I can load binary data into PG if it's ASCII-encoded octal? Why not the user-defined type with associated user-defined input function? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[GENERAL] Date format for bulk copy
I have a large amount of data that I copy in and out of Sybase very often. Now I also want to copy this data in and out of postgres. I have an existing script that creates the entire database(s) from scratch in Sybase and then uses the Sybase bulk copy tool "bcp" to copy the data in. I already did a brute force port of this script to postgres once, but I'm trying to do it more elegantly now that I know what issues I'm going to run into. One of them is date formats in the bcp files. Sybase bulk copies the date fields out in this format: Mar 4 1973 10:28:00:000AM Postgresql's COPY (or psql \copy) doesn't like that format. In particular, it doesn't like the millisecond field at the end. If I understand the docs correctly, postgres wants the millisecond field to be proceeded by a decimal point instead of a colon. In my brute force port, I just bulk copied the date fields into temporary tables and then did a to_timestamp(field, 'Mon DD HH:MI:SS:MSAM'). That worked, but required a lot of additional logic in my script to handle the temp tables and conversions. I'd hate to have to keep all that overhead in there to basically handle a conversion of a colon to a decimal point. So my questions are these: 0) I thought of creating a user-defined data type for this, but it seems like overkill, especially if I'd have to provide all kinds of helper functions for things like date incrementation or comparison or whatever. Am I off track? 1) Are there any tools out there that allow for specifying the field format of a COPY? 2) If not, is it reasonable or unreasonable to modify the postgresql source (I'm running Beta 3) to handle a colon as a millisecond delimiter? (If so, where do I look?) 3) If I did create such a patch, would the postgresql accept it into the tree? I have a similarish problem with another field type. In Sybase it's a binary format. In postgres it is a binary format (bytea). But Sybase bcps the data out in ASCII. Sybase recognizes that when it is a binary field and auto-converts the ASCII back to binary. Postgres doesn't. Again, I created a temporary table and did a decode(field, 'hex') to the real table. It seems reasonable to expect to be able to bulk copy ASCII-encoded binary values into binary fields. Probably this field is best described by a user-defined type? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Any recommended forums/wiki/blog s/w that uses
Devrim GUNDUZ wrote: Hi, On Tue, 17 Aug 2004, Shridhar Daithankar wrote: Anyone have any suggestions? drupal? Check out at http://www.drupal.org/ Drupal needs some hacking since it uses LIMIT #,# queries in database-pear.php. Just a FYI. I tried and really liked MoinMoin. http://moinmoin.wikiwikiweb.de/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] Simplfied Bytea input/output?
Incredibly, I was just sitting down to do something similar for a problem I have when I read this email. I'm going to do a temp table too, but I did think of another solution. It would work for me but it's a little complex for my stage of PG expertise: Create a user-defined type for "pic" and define an input/output function for it. Someone can correct me but my understanding says that probably won't be as fast as a direct bulk copy, but it can't be slower than the temp table method and it is certainly simpler. Jerry LeVan wrote: Hi, I have been looking for a fairly simple way to upload data into a bytea field without having to write custom C code for each table that contains a bytea field. With some good advice from Daniel Verite and reading the fine manual here is my procedure for uploading files to bytea fields. 1) Create an "upload table" that might look like \d picsTable "public.pics" Column | Type |Modifiers +- +- info | text| image | bytea | ident | integer | not null default nextval('public.pics_ident_seq'::text) 2) I wrote a single C procedure that would upload to this table here is a fragment of the code: int usage() { fprintf(stderr,"loadBytea \n"); fprintf(stderr," This will insert the comment (a string) and the contents of file\n"); fprintf(stderr," into the first two columns of the specified table.\n"); exit(0); } The core of the program is a wrapper around the PQexecParams routine. The source code for the program is located here: http://homepage.mac.levanj/Cocoa/programs/loadBytea.c 3) Once the file is uploaded (say to table pics) I can use the update command to load the bytea field into the desired table, perhaps something like: update person set picture = pics.image from pics where pics.ident=15 and person.first_name='Marijo' Once the bytea field has been loaded into the target, it can be deleted from the upload table ( or a reference could be placed in the "person" table to the appropriate picture and all of the pictures keep in the upload table). This method does not scale well to bulk input but I think it would not be difficult to rewrite the loadBytea.c program as needed. Jerry ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] COPY not handling BLOBs
I have a bunch of data in Sybase and some of it is in image fields. We use bcp on this data transparently all the time without major issues in character mode. Is there a fundamental technical reason that BLOBs can't be COPY'd in postgresql or is it just that nobody has ever wanted to before? (If I was starting from scratch I'd probably be happy with pg_dump/pg_restore, but I'd like to migrate in easy stages over from Sybase, so I'd like the bcp files to be interchangable.) Right now I'm having to write a program to create all the large objects up front and record the OIDs in a file which I will then COPY in with the rest of my data. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match